Release Notes

Release notes from the previous versions of Flexter.

FLEXTER 2.13

New features

xml2er / json2er
- Included --round-mode parameter for the numeric rounding mode when a number is converted to another with less precision or scale. The default is HALF_UP
- Included --has-header parameter that merge2er already has. It’s for cases the input csv/psv/tsv file has a header
merge2er
- Added support to sas7bdat file format as input source
- Included --file-table parameter that enables loading files as tables, instead of their directories as tables
xml2er / json2er / merge2er
- Included --manisfest parameter to enable or disable the creation of a manifest file with the mapping information
download.sh
- Included parameters to download iceberg packages
- Included a parameter to download sas7bdat file format packages

Improvements

xsd2er
- Making a XSD entity match with 2 different XPaths, one that matches with the namespace and another with no namespace
- Switching --legacy-dialect-version to version 2 as default, allowing to use most recent hive datatypes

Fixes

xml2er / json2er
- Fixed summary logs where overall status got in a different order with xpath and documents
xml2er / json2er / merge2er
- Fixed jdbc support along with --tables-at-once parameter
- Fixed Iceberg creating tables in appending mode
- Fixed the use of quoted identifiers with --extra-column and --partition-column parameters
xsd2err
- Fixed false positive mixed content classification for complex types that restricts or extends simple types
flexchma
- Increase namespace calculation sizes
flexter-ui
- Fixed document stats results when they are filtered
flexter-docker / flexter-cmd
- Fixed the issue where the /home/spark folder is expected by download.sh, but it’s not created

FLEXTER 2.12

New features

Spark
- Default Spark upgraded to 3.5.6 version
Snowflake
- Enabling --in-password and --out-password parameters for private key authentication
xml2er / xsd2er / json2er/ merge2er
- Included --store-credentials parameter to enable or disable storing credentials in the metadata database, disabled by default
xml2er / json2er
- Included --truncate-on-overflow parameter enable or disable string truncation when data exceeds column length, enabled by default
- Included --toleance-level parameter to change which issues are acceptable during the parsing process.
- Included --stop-mixed parameter to interrupt the process when new mixed content XPath is detected.
- Included --legacy-dialect-version parameter to switch between legacy dialect versions. keeping backward compatibility.
flexter-docker / flexter-db
- Ported to use a light-weight official alpine linux postgres image
- Default Postgresql version upgraded from 15 to 17
- Default Java upgraded to 17 version
- Included azcopy command-line tool for flexchma
flexter-docker / flexter-cmd
- Ported to support Kubeflow Spark Operator using the official Spark image base
- Default Java upgraded to 17 version
- Default Spark upgraded to 3.5.6 version
- Included spark-snowflake 3.1.5 version
- Included spark-hadoop-cloud containing: hadoop-aws, hadoop-azure, hadoop-openstack, hadoop-aliyun, gcs-connector
- Included azcopy command-line tool for flexchma
- Adapted to support docker’s apache/spark base images from 3.1.3 to 3.5.x
flexter-ui
- Included optional Azure Active Directory authentication
- Included import/export schema buttons
- Included get_mapping endpoint with JSON and CSV output formats
- Included view for document statistics

Improvements

flexter-ui / rest-api
- Masking credentials
xsd2er
- support to union and list length and digits calculations
- Enabled the --default-varchar-len parameter being set permanently with --map, -g parameter
download.sh
- Improved the download speed using apache dynamic mirrors

Fixes

xml2er / json2er
- Fixed decimal precision and scale statistics calculation for spread int digits and fractional digits
- Fixed --out-opt parameter not being set for output file formats
- Fixed --default-varchar-len should avoid overriding columns with defined lengths
- Fixed precision loss false positive warnings for decimals with multiple factional zeros, ex: 1.0000
- Fixed duplicated root table columns when using --extra-column or --partition-column parameters containing FILENAME or FILEPATH
xml2er / json2er/ merge2er / xsd2er
- Fixed job table params and params_json columns missing parameters
xml2er
- Fixed attributes being flagged as mixed content in statistics
- Fixed tags being flagged as mixed content when mapped already
- Fixed mixed content warnings when the mapping already knows it’s mixed.
merge2er
- Fixed temporary directory allocation for file-based cases
flexter-ui
- Fixed sequence names when flexter-ui started before flexter-db had finished its table creation
- Fixed static web files not being attached during the release process
- Fixed import/export button for postgresql URLs omitting port and/or host
- Fixed loading max_occurs
- Fixed cases showing false new XPaths in the statistics
flexchma
- Function compare_mappings now treats duplicated FKs
- Removing / character at table names based on JSON field names
flexter-docker / flexter-cmd
- Fixed its initialization via mounted flexter.conf file

FLEXTER 2.11

New features

Java
- Extending Java support between 8 and 17 versions
Spark
- Extending Spark support between 3.1.x and 3.5.x versions
- Default Spark upgraded to 3.5.2 version
xsd2er
- Included --xpath-phase <unit|def|all> parameter to help to print out XPaths in different XPath build phases
xml2er / json2er
- Included --ignore-mixed-content parameter to avoid parsing xml tags or json values with mixed content
- Included --parallel-sequence parameter to enforce sequences for parallel environments like clusters.
xml2er
- Included --parse-lib <LIB> enabling different SAX parser implementations beyond the SAX parser shipped with Java JDK
flexter-ui / rest-api
- Included endpoint to terminate active jobs

Improvements

xml2er / json2er
- Performance improvements while processing statistics
- Performance improvements while processing mixed content
- Memory consumption improvements between parsing and caching data
flexchma / calcmap
- The order of the table’s columns is now also determined by the numeric suffixes of field names
flexchma / migration.py
- The export utility consolidates the specified schema and all its parent schemas, ensuring that only the requested model is included
flexter-ui
- Keeping track in the processed ids enabling check if the processes still alive
xsd2er
- Improved dependency issues messages, testing all possible issues and compiling a list of issues
- Included warning messages when root tags match between stats and XSDs, when one side has empty namespace

Fixes

xml2er / json2er / merge2er
- Enhanced the maximum decimal precision and scale for BigQuery from 19,9 to 38,38 (maximum accepted by spark) based on recent BigQuery improvements
- Included quoteIdentifier parameter for jdbc:postgresql:// URI enabling optionally double-quotes in identifiers
- Oracle CLOB columns wasn’t accepting text above 4000 length.
xml2er / json2er
- Removed --tables-at-once restriction policy to run only in java 8
- Truncating text content bigger than 16mb for Snowflake, avoiding process crashes
- Fixed --id-column <name> extra and --extra-column <name> id parameter cases, which injects an extra column based on input’s ID value
- Numeric data types statistics detection was allowing precision and scale beyond limits.
- Fixed sequences in Databricks environments where the Spark Cluster is kept alive across multiple jobs.
json2er
- Fix for single and empty arrays, they weren’t detected as 1:N
xsd2er
- Fixed false positive mixed content assumption, for abstract xsd types without content
- Fix for bi-directional file reference cases, which was causing some failed links and performance loss
- Fix to handle prefixes larger than 30 characters, now it supports up to 255 characters
- Fix to bring all historical stats linked with the schema origin, not only the ones informed by –use-stats parameter
- Fix combined stats with already processed data flow/mapping were casing failures
flexter-ui / merge2er
- Making new merge2er jobs listed in the flexter-ui jobs list
flexter-ui
- Removing jobs list Spark History Server button if it isn’t set
- Fixed populating multiple input parameters in job table
flexchma / migration.py
- Export schemas wasn’t including id_du in the exported du_stat tables.
flexchma / calcstruct
- Stats consolidation should not preserve data_units from historical schemas
log4j
- File appender implemented for log4j2 (spark 3.3-3.5) as it was for log4j1 (spark 3.1-3.2)
- Fixed print out commands parameter -c for log4j2 (spark 3.3-3.5)
- Fixed parameter -L for log4j2 environments Spark 3.3 and higher
- Including the starting log events in log files that use job id in the name.

FLEXTER 2.10

New features

Spark
- Default Spark upgraded to 3.3.4 version.
download.sh
- Included option to download spark-hadoop-cloud dependency
- Included option to download Hadoop Aliyun Cloud dependencies
- Included option to download Hadoop Tencent OSS Cloud dependencies
- Included option to download Hadoop Open Stack Cloud dependencies
- Included option to download Hadoop Huawei Cloud OBS dependencies
- Separated Google Cloud Storage and Google Big Query options
- Include option to download separated hive packages
xml2er / json2er
- Included --remap-tables and --remap-table <TABLE,...> parameters to reorder table columns based tables found in the output

Fixes

xml2er / json2er / merge2er
- Fix in the HDFS blocksize, which Flexter was enforcing always 0 after Spark 3.3 changes.
xml2er / json2er
- Fix the Spark application name which was appearing with only (…) instead xm2er (…) or json2er (…).
xml2er
- Fix performance issues for large files containing mixed content, ex: HTML, Formated text and other tags mixed with text cases.

FLEXTER 2.9

New features

flexchma / migration.py
- By default, the export utility consolidates schema’s data before exporting it. This behaviour can be disabled passing the optional --export-full parameter
- Bulk copy when importing previously extracted schemas
- Logging improvements, collecting the full mapping with the list of new schema ids and their original ones
flexchma / db
- New ad-hoc function for stats consolidation.
- New accessory function (compact_stats) to consolidate stats and persisting them, with an optional switch to purge obsolete entries
flexter-ui / rest-api
- Included endpoint to call import/export metadata.

FLEXTER 2.8

Fixes

xml2er / xsd2er
- Fix in xsi:type cases with missing default types.
flexter-ui
- Masking passwords sent by parameter.
xml2er
- Fix attributes of xsi:type tags in the statistics process.

FLEXTER 2.7

New features

Spark
- Extending Spark support between 3.1.x and 3.3.x versions.
Docker
- Support to Kubernetes environments.

Improvements

Databricks
- Support delta tables merge schema feature.

Fixes

Spark
- Fix dynamic log settings loading for Spark 3.3.
- Fix loading --conf parameters from the application.
flexchma
- Fix generating schema from previous could cause false positive mixed content cases.
Databricks
- Job error status detection
Cloudera
- Fix kyro serialization
xml2er / json2er
- Fix date formats with 3 character Months like Jan, Feb, Mar…

FLEXTER 2.6

New features

Spark
- Extending Spark support between 3.1.x and 3.2.x versions.
xml2er
- Including the support for xsi:type stats only cases.

Improvements

xml2er / xsd2er
- Pattern matching between XML stats and XSDs.
docker
- Changes to support non-root users for kubernetes.
databricks
- Support to load flexter and log4j settings.

Fixes

xml2er
- Detecting recursive tables generated by reuse optimization algorithm.
Snowflake
- Including truncation for 16+ kilobytes text data.
- JDBC - VARIANT switched to VARCHAR to avoid 16+ kilobytes issues.
json2er
- Fixed null values were treated ignored column nullable definition.
Cloudera
- Removing Spark 3.1 and 3.2 verbose logging.
Spark
- Fixed spark dependencies downloads.
xml2er / json2er
- Generating dataflow without inform input or -x parameter: ex xml2er -a123 -g1.
- XPaths, tables and columns can be disabled in the metadata database.
- The parameter --default-varchar-len doesn’t take effect
- Calling with -a <id> and -g1 inserts 1 extra metadata schma
xsd2er
- Possibility to set mixed=false in an inherited mixed=true type.
xml2er / json2er / xsd2er
- Duplicated table names due case sensitiveness
download.sh
- Root and non-root users are accessing the same download directory

FLEXTER 2.5

New features

xml2err
- Support casting XML tags using xsi:type attribute.
xsd2er
- Support associating xsi:type statistics with XSDs statistics.
download.sh
- Support downloading dependencies: aws, azure, gcloud, snowflake and custom packages.

Fixes

xml2er / json2er
- Numeric values with precision/scale higher than 38 were truncated or not supported by the output resource, now are considered as VARCHAR.
- Making possible to change the level log particular parsing messages in the log4j.properties.
xml2er
- Namespace-less tag name with type as been ignored by xml2er.

FLEXTER 2.4

New features

xml2er / json2er
- Customizable integer/float output data types.
- Preventing table/column names with any oracle’s reserved words
- Ignoring jdbc input when the table isn’t informed and -g,--map parameter called.
- Including the number of tables to be written in the log
- Improved the FK_ref_as algorithm to filling it with relative XPaths

Fixes

Spark
- Multiple flexter.conf files cause spark crashes
- JDBC debug is failing after spark 3 upgrade
- JDBC dialects always been written as CLOB after Spark 3 upgrade
- Postgresql JDBC dialects inserts quotes except for comments
- Hive annoying WARN messages with Spark 3
flexter-ui / res-api
- Rest api doesn’t support json2er and merge2er modules
flexchma
- Calcmap reuse optimization cause table columns loss
xml2er / json2er / xsd2er
- Omitted --name-max-len is truncating generated names bigger than 30
xml2er / json2er
- --sequence-type isn’t enforced to reduce the numeric precision
- failing to parse -R "columnName=2001-01-01 00:00:00"
xml2er
- XML tags with xsi:nil=“true” is detected as text tag
- xml2er doesn’t load -i parameter from a job
json2er
- json2er isn’t working with mongodb
- json2er isn’t generating correct schema name with select clause
xsd2er
- xsd2er isn’t locating the correct path in docker
- xsd2er --stop-policy +0 produce same results as +1

FLEXTER 2.3

New features

json2er
- Support to MongoDB as input source
xml2er / json2er
- Support to MongoDB as output target

Fixes

xml2er / json2er / xsd2er
- Numeric scale were stored as null in some cases
xml2er / json2er
- Extra columns with regular expressions

FLEXTER 2.2

New features

Spark
- Spark baseline version migrated to 3.1.x. It’s no longer compatible with Spark 2.x
- Default Spark upgraded to 3.1.2 version.

FLEXTER 2.1

New features

xml2er / json2er
- Included --sequence-type <SQLTYPE> accepting both VARCHAR and NUMERIC(precision, scale) and other numeric variants_, integer, long_…
- Included --console <s|p> shortcut to call flexter application as console functions: ex xml2er()
Spark
- Default Spark upgraded to 2.4.7 version.

Fixes

flexchma
- Calcmap - Preventing generating table/colum names starting with _ and numbers

FLEXTER 2.0

New features

xml2er / xsd2er
- More accurate namespace + xpath analysis
flexchma
- Flexter Schema now can be installed and upgraded by command-line

Fixes

xml2er
- Extra hidden characters are being filtered in XML documents
xsd2er
- Detecting xs:nil tags as data column without sample

FLEXTER 1.10

New features

Yellobrick
- Support to Yellobrick data warehouse
Google Cloud
- Support to Google BigQuery data warehouse and Google Storage
xml2er / json2er / xsd2er
- Included --default-varchar-len parameter
- Changed --use-stats parameter now has the -a shortcut

FLEXTER 1.9

New features

Spark
- More compatibility between hive tables and other hive-based tools for orc formats.
xml2er / json2er
- Included --namemode to enforce lower, upper or camel case table/column names.
json2er
- Accepting json fields with spaces, slashes and other special chars.
xml2er / json2er / xsd2er / merge2er
- Logging into files throughout log4j
- Included --license parameter to load it externally
AWS
- Support to Redshift JDBC connections
- Experimental support for Redshift Spark Connector

FLEXTER 1.8

New features

xml2er / json2er
- Included –extra-column and –partition-column replacing –partition-by and –partition-by-name parameters.
- Improved –column and –id-column to accept expressions, casting and aliases.
- Included –prefix and –suffix at output table names.
- Included –rename to be able to rename output table names.
merge2er
- Enabling copying –constraints from jdbc to jdbc tables in the merge2er as an experimental feature.

FLEXTER 1.7

New features

Spark
- Default Spark upgraded to 2.4.3 version.
- Extending Spark support between 2.1.x and 2.4.x versions.
- Included other spark-submit parameters into bash script launcher for kerberos authentication (--queue, --principal, --keytab)
xml2er / json2er
- Printing the output tables' DDL out to the console for Hive or Jdbc dialect
- Writing tables' schema directly to Hive or Jdbc database
- Included two further options to disable writing document statistics and xpath statistics
xsd2er
- The xsd2er module will keep track of stats used for schema generation
merge2er
- The merge2er module now works in skip mode and is capable to import SQL query result.
- Added support for jdbc input/output targets

Fixes

xml2er / json2er / merge2er
- Removed table exists SQL checks from the logs when using the -c parameter
- Disabled quotation for generic JDBC driver, while enabled for specific ones
xml2er / json2er
- Processing dates with different formats (eg. Day-Month-Year, Month-Day-Year …)
xml2er
- Mixed flag is ignored if not explicitly set in stats
merge2er
- Reviewed merge2er log messages and default output format