Release Notes

Release notes from the previous versions of Flexter.

FLEXTER 2.11

New features

  • Java
    • Extending Java support between 8 to 17 versions
  • Spark
    • Extending Spark support between 3.1.x to 3.5.x versions
    • Default Spark upgraded to 3.5.2 version
  • xsd2er
    • Included --xpath-phase <unit|def|all> parameter to help to print out XPaths in different XPath build phases
  • xml2er / json2er
    • Included --ignore-mixed-content parameter to avoid parsing xml tags or json values with mixed content
  • xml2er
    • Included --parse-lib <LIB> enabling different SAX parser implementations beyond the SAX parser shipped with Java JDK
  • flexter-ui / rest-api
    • Included endpoint to terminate active jobs

Improvements

  • xml2er / json2er
    • Performance improvements while processing statistics
    • Performance improvements while processing mixed content
  • flexchma / calcmap
    • The order of the table’s columns is now also determined by the numeric suffixes of field names
  • flexchma / migration.py
    • The export utility consolidates the specified schema and all its parent schemas, ensuring that only the requested model is included
  • flexter-ui
    • Keeping track in the processed ids enabling check if the processes still alive

Fixes

  • xml2er / json2er / merge2er
    • Enhanced the maximum decimal precision and scale for BigQuery from 19,9 to 38,38 (maximum accepted by spark) based on recent BigQuery improvements
    • Included quoteIdentifier parameter for jdbc:postgresql:// URI enabling optionally double-quotes in identifiers
    • Oracle CLOB columns wasn’t accepting text above 4000 length.
  • xml2er / json2er
    • Removed --tables-at-once restriction policy to run only in java 8
    • Truncating text content bigger than 16mb for Snowflake, avoiding process crashes
    • Fixed --id-column <name> extra and --extra-column <name> id parameter cases, which injects an extra column based on input’s ID value
    • Numeric data types statistics detection was allowing precision and scale beyond limits.
  • xsd2er
    • Fixed false positive mixed content assumption, for abstract xsd types without content
    • Fix for bi-directional file reference cases, which was causing some failed links and performance loss
  • flexter-ui / merge2er
    • Making new merge2er jobs listed in the flexter-ui jobs list
  • flexter-ui
    • Removing jobs list Spark History Server button if it isn’t set
    • Fixd populating multiple input parameters in job table
  • flexchma / migration.py
    • Export schemas wasn’t including id_du in the exported du_stat tables.
  • flexchma / calcstruct
    • Stats consolidation should not preserve data_units from historical schemas
  • log4j
    • File appender implemented for log4j2 (spark 3.3-3.5) as it was for log4j1 (spark 3.1-3.2)
    • Fixed print out commands parameter -c for log4j2 (spark 3.3-3.5)
    • Fixed parameter -L for log4j2 environments Spark 3.3 and higher

FLEXTER 2.10

New features

  • Spark
    • Default Spark upgraded to 3.3.4 version.
  • download.sh
    • Included option to download spark-hadoop-cloud dependency
    • Included option to download Hadoop Aliyun Cloud dependencies
    • Included option to download Hadoop Tencent OSS Cloud dependencies
    • Included option to download Hadoop Open Stack Cloud dependencies
    • Included option to download Hadoop Huawei Cloud OBS dependencies
    • Separated Google Cloud Storage and Google Big Query options
    • Include option to download separated hive packages
  • xml2er / json2er
    • Included --remap-tables and --remap-table <TABLE,...> parameters to reorder table columns based tables found in the output

Fixes

  • xml2er / json2er / merge2er
    • Fix in the HDFS blocksize, which Flexter was enforcing always 0 after Spark 3.3 changes.
  • xml2er / json2er
    • Fix the Spark application name which was appearing with only (…) instead xm2er (…) or json2er (…).
  • xml2er
    • Fix performance issues for large files containing mixed content, ex: HTML, Formated text and other tags mixed with text cases.

FLEXTER 2.9

New features

  • flexchma / migration.py
    • By default, the export utility consolidates schema’s data before exporting it. This behaviour can be disabled passing the optional --export-full parameter
    • Bulk copy when importing previously extracted schemas
    • Logging improvements, collecting the full mapping with the list of new schema ids and their original ones
  • flexchma / db
    • New ad-hoc function for stats consolidation.
    • New accessory function (compact_stats) to consolidate stats and persisting them, with an optional switch to purge obsolete entries
  • flexter-ui / rest-api
    • Included endpoint to call import/export metadata.

FLEXTER 2.8

Fixes

  • xml2er / xsd2er
    • Fix in xsi:type cases with missing default types.
  • flexter-ui
    • Masking passwords sent by parameter.
  • xml2er
    • Fix attributes of xsi:type tags in the statistics process.

FLEXTER 2.7

New features

  • Spark
    • Extending Spark support between 3.1.x to 3.3.x versions.
  • Docker
    • Support to Kubernetes environments.

Improvements

  • Databricks
    • Support delta tables merge schema feature.

Fixes

  • Spark
    • Fix dynamic log settings loading for Spark 3.3.
    • Fix loading --conf parameters from the application.
  • flexchma
    • Fix generating schema from previous could cause false positive mixed content cases.
  • Databricks
    • Job error status detection
  • Cloudera
    • Fix kyro serialization
  • xml2er / json2er
    • Fix date formats with 3 character Months like Jan, Feb, Mar…

FLEXTER 2.6

New features

  • Spark
    • Extending Spark support between 3.1.x to 3.2.x versions.
  • xml2er
    • Including the support for xsi:type stats only cases.

Improvements

  • xml2er / xsd2er
    • Pattern matching between XML stats and XSDs.
  • docker
    • Changes to support non-root users for kubernetes.
  • databricks
    • Support to load flexter and log4j settings.

Fixes

  • xml2er
    • Detecting recursive tables generated by reuse optimization algorithm.
  • Snowflake
    • Including truncation for 16+ kilobytes text data.
    • JDBC - VARIANT switched to VARCHAR to avoid 16+ kilobytes issues.
  • json2er
    • Fixed null values were treated ignored column nullable definition.
  • Cloudera
    • Removing Spark 3.1 and 3.2 verbose logging.
  • Spark
    • Fixed spark dependencies downloads.
  • xml2er / json2er
    • Generating dataflow without inform input or -x parameter: ex xml2er -a123 -g1.
    • XPaths, tables and columns can be disabled in the metadata database.
    • The parameter --default-varchar-len doesn’t take effect
    • Calling with -a <id> and -g1 inserts 1 extra metadata schma
  • xsd2er
    • Possibility to set mixed=false in an inherited mixed=true type.
  • xml2er / json2er / xsd2er
    • Duplicated table names due case sensitiveness
  • download.sh
    • Root and non-root users are accessing the same download directory

FLEXTER 2.5

New features

  • xml2err
    • Support casting XML tags using xsi:type attribute.
  • xsd2er
    • Support associating xsi:type statistics with XSDs statistics.
  • download.sh
    • Support downloading dependencies: aws, azure, gcloud, snowflake and custom packages.

Fixes

  • xml2er / json2er
    • Numeric values with precision/scale higher than 38 were truncated or not supported by the output resource, now are considered as VARCHAR.
    • Making possible to change the level log particular parsing messages in the log4j.properties.
  • xml2er
    • Namespace-less tag name with type as been ignored by xml2er.

FLEXTER 2.4

New features

  • xml2er / json2er
    • Customizable integer/float output data types.
    • Preventing table/column names with any oracle’s reserved words
    • Ignoring jdbc input when the table isn’t informed and -g,--map parameter called.
    • Including the number of tables to be written in the log
    • Improved the FK_ref_as algorithm to filling it with relative XPaths

Fixes

  • Spark
    • Multiple flexter.conf files cause spark crashes
    • JDBC debug is failing after spark 3 upgrade
    • JDBC dialects always been written as CLOB after Spark 3 upgrade
    • Postgresql JDBC dialects inserts quotes except for comments
    • Hive annoying WARN messages with Spark 3
  • flexter-ui / res-api
    • Rest api doesn’t support json2er and merge2er modules
  • flexchma
    • Calcmap reuse optimization cause table columns loss
  • xml2er / json2er / xsd2er
    • Omitted --name-max-len is truncating generated names bigger than 30
  • xml2er / json2er
    • --sequence-type isn’t enforced to reduce the numeric precision
    • failing to parse -R "columnName=2001-01-01 00:00:00"
  • xml2er
    • XML tags with xsi:nil=“true” is detected as text tag
    • xml2er doesn’t load -i parameter from a job
  • json2er
    • json2er isn’t working with mongodb
    • json2er isn’t generating correct schema name with select clause
  • xsd2er
    • xsd2er isn’t locating the correct path in docker
    • xsd2er --stop-policy +0 produce same results as +1

FLEXTER 2.3

New features

  • json2er
    • Support to MongoDB as input source
  • xml2er / json2er
    • Support to MongoDB as output target

Fixes

  • xml2er / json2er / xsd2er
    • Numeric scale were stored as null in some cases
  • xml2er / json2er
    • Extra columns with regular expressions

FLEXTER 2.2

New features

  • Spark
    • Spark baseline version migrated to 3.1.x. It’s no longer compatible with Spark 2.x
    • Default Spark upgraded to 3.1.2 version.

FLEXTER 2.1

New features

  • xml2er / json2er
    • Included --sequence-type <SQLTYPE> accepting both VARCHAR and NUMERIC(precision, scale) and other numeric variants_, integer, long_…
    • Included --console <s|p> shortcut to call flexter application as console functions: ex xml2er()
  • Spark
    • Default Spark upgraded to 2.4.7 version.

Fixes

  • flexchma
    • Calcmap - Preventing generating table/colum names starting with _ and numbers

FLEXTER 2.0

New features

  • xml2er / xsd2er
    • More accurate namespace + xpath analysis
  • flexchma
    • Flexter Schema now can be installed and upgraded by command-line

Fixes

  • xml2er
    • Extra hidden characters are being filtered in XML documents
  • xsd2er
    • Detecting xs:nil tags as data column without sample

FLEXTER 1.10

New features

  • Yellobrick
    • Support to Yellobrick data warehouse
  • Google Cloud
    • Support to Google BigQuery data warehouse and Google Storage
  • xml2er / json2er / xsd2er
    • Included --default-varchar-len parameter
    • Changed --use-stats parameter now has the -a shortcut

FLEXTER 1.9

New features

  • Spark
    • More compatibility between hive tables and other hive based tools for orc formats.
  • xml2er / json2er
    • Included --namemode to enforce lower, upper or camel case table/column names.
  • json2er
    • Accepting json fields with spaces, slashes and other special chars.
  • xml2er / json2er / xsd2er / merge2er
    • Logging into files throughout log4j
    • Included --license parameter to load it externally
  • AWS
    • Support to Redshift JDBC connections
    • Experimental support for Redshift Spark Connector

FLEXTER 1.8

New features

  • xml2er / json2er
    • Included –extra-column and –partition-column replacing –partition-by and –partition-by-name parameters.
    • Improved –column and –id-column to accept expressions, casting and aliases.
    • Included –prefix and –suffix at output table names.
    • Included –rename to be able to rename output table names.
  • merge2er
    • Enabling copying –constraints from jdbc to jdbc tables in the merge2er as an experimental feature.

FLEXTER 1.7

New features

  • Spark
    • Default Spark upgraded to 2.4.3 version.
    • Extending Spark support between 2.1.x to 2.4.x versions.
    • Included other spark-submit parameters into bash script launcher for kerberos authentication (--queue, --principal, --keytab)
  • xml2er / json2er
    • Printing the output tables' DDL out to the console for Hive or Jdbc dialect
    • Writing tables' schema directly to Hive or Jdbc database
    • Included two further options to disable writing document statistics and xpath statistics
  • xsd2er
    • The xsd2er module will keep track of stats used for schema generation
  • merge2er
    • The merge2er module now works in skip mode and is capable to import SQL query result.
    • Added support for jdbc input/output targets

Fixes

  • xml2er / json2er / merge2er
    • Removed table exists SQL checks from the logs when using the -c parameter
    • Disabled quotation for generic JDBC driver, while enabled for specific ones
  • xml2er / json2er
    • Processing dates with different formats (eg. Day-Month-Year, Month-Day-Year …)
  • xml2er
    • Mixed flag is ignored if not explicitly set in stats
  • merge2er
    • Reviewed merge2er log messages and default output format