Release Notes
Release notes from the previous versions of Flexter.
FLEXTER 2.11
New features
- Java
- Extending Java support between 8 to 17 versions
- Spark
- Extending Spark support between 3.1.x to 3.5.x versions
- Default Spark upgraded to 3.5.2 version
- xsd2er
- Included
--xpath-phase <unit|def|all>
parameter to help to print out XPaths in different XPath build phases
- Included
- xml2er / json2er
- Included
--ignore-mixed-content
parameter to avoid parsing xml tags or json values with mixed content
- Included
- xml2er
- Included
--parse-lib <LIB>
enabling different SAX parser implementations beyond the SAX parser shipped with Java JDK
- Included
- flexter-ui / rest-api
- Included endpoint to terminate active jobs
Improvements
- xml2er / json2er
- Performance improvements while processing statistics
- Performance improvements while processing mixed content
- flexchma / calcmap
- The order of the table’s columns is now also determined by the numeric suffixes of field names
- flexchma / migration.py
- The export utility consolidates the specified schema and all its parent schemas, ensuring that only the requested model is included
- flexter-ui
- Keeping track in the processed ids enabling check if the processes still alive
Fixes
- xml2er / json2er / merge2er
- Enhanced the maximum decimal precision and scale for BigQuery from 19,9 to 38,38 (maximum accepted by spark) based on recent BigQuery improvements
- Included
quoteIdentifier
parameter forjdbc:postgresql://
URI enabling optionally double-quotes in identifiers - Oracle CLOB columns wasn’t accepting text above 4000 length.
- xml2er / json2er
- Removed
--tables-at-once
restriction policy to run only in java 8 - Truncating text content bigger than 16mb for Snowflake, avoiding process crashes
- Fixed
--id-column <name> extra
and--extra-column <name> id
parameter cases, which injects an extra column based on input’s ID value - Numeric data types statistics detection was allowing precision and scale beyond limits.
- Removed
- xsd2er
- Fixed false positive mixed content assumption, for abstract xsd types without content
- Fix for bi-directional file reference cases, which was causing some failed links and performance loss
- flexter-ui / merge2er
- Making new merge2er jobs listed in the flexter-ui jobs list
- flexter-ui
- Removing jobs list Spark History Server button if it isn’t set
- Fixd populating multiple input parameters in job table
- flexchma / migration.py
- Export schemas wasn’t including id_du in the exported du_stat tables.
- flexchma / calcstruct
- Stats consolidation should not preserve data_units from historical schemas
- log4j
- File appender implemented for log4j2 (spark 3.3-3.5) as it was for log4j1 (spark 3.1-3.2)
- Fixed print out commands parameter
-c
for log4j2 (spark 3.3-3.5) - Fixed parameter
-L
for log4j2 environments Spark 3.3 and higher
FLEXTER 2.10
New features
- Spark
- Default Spark upgraded to 3.3.4 version.
- download.sh
- Included option to download spark-hadoop-cloud dependency
- Included option to download Hadoop Aliyun Cloud dependencies
- Included option to download Hadoop Tencent OSS Cloud dependencies
- Included option to download Hadoop Open Stack Cloud dependencies
- Included option to download Hadoop Huawei Cloud OBS dependencies
- Separated Google Cloud Storage and Google Big Query options
- Include option to download separated hive packages
- xml2er / json2er
- Included
--remap-tables
and--remap-table <TABLE,...>
parameters to reorder table columns based tables found in the output
- Included
Fixes
- xml2er / json2er / merge2er
- Fix in the HDFS blocksize, which Flexter was enforcing always 0 after Spark 3.3 changes.
- xml2er / json2er
- Fix the Spark application name which was appearing with only (…) instead xm2er (…) or json2er (…).
- xml2er
- Fix performance issues for large files containing mixed content, ex: HTML, Formated text and other tags mixed with text cases.
FLEXTER 2.9
New features
- flexchma / migration.py
- By default, the export utility consolidates schema’s data before exporting it. This behaviour can be disabled passing the optional
--export-full
parameter - Bulk copy when importing previously extracted schemas
- Logging improvements, collecting the full mapping with the list of new schema ids and their original ones
- By default, the export utility consolidates schema’s data before exporting it. This behaviour can be disabled passing the optional
- flexchma / db
- New ad-hoc function for stats consolidation.
- New accessory function (compact_stats) to consolidate stats and persisting them, with an optional switch to purge obsolete entries
- flexter-ui / rest-api
- Included endpoint to call import/export metadata.
FLEXTER 2.8
Fixes
- xml2er / xsd2er
- Fix in xsi:type cases with missing default types.
- flexter-ui
- Masking passwords sent by parameter.
- xml2er
- Fix attributes of xsi:type tags in the statistics process.
FLEXTER 2.7
New features
- Spark
- Extending Spark support between 3.1.x to 3.3.x versions.
- Docker
- Support to Kubernetes environments.
Improvements
- Databricks
- Support delta tables merge schema feature.
Fixes
- Spark
- Fix dynamic log settings loading for Spark 3.3.
- Fix loading
--conf
parameters from the application.
- flexchma
- Fix generating schema from previous could cause false positive mixed content cases.
- Databricks
- Job error status detection
- Cloudera
- Fix kyro serialization
- xml2er / json2er
- Fix date formats with 3 character Months like Jan, Feb, Mar…
FLEXTER 2.6
New features
- Spark
- Extending Spark support between 3.1.x to 3.2.x versions.
- xml2er
- Including the support for xsi:type stats only cases.
Improvements
- xml2er / xsd2er
- Pattern matching between XML stats and XSDs.
- docker
- Changes to support non-root users for kubernetes.
- databricks
- Support to load flexter and log4j settings.
Fixes
- xml2er
- Detecting recursive tables generated by reuse optimization algorithm.
- Snowflake
- Including truncation for 16+ kilobytes text data.
- JDBC - VARIANT switched to VARCHAR to avoid 16+ kilobytes issues.
- json2er
- Fixed null values were treated ignored column nullable definition.
- Cloudera
- Removing Spark 3.1 and 3.2 verbose logging.
- Spark
- Fixed spark dependencies downloads.
- xml2er / json2er
- Generating dataflow without inform input or
-x
parameter: exxml2er -a123 -g1
. - XPaths, tables and columns can be disabled in the metadata database.
- The parameter
--default-varchar-len
doesn’t take effect - Calling with
-a <id>
and-g1
inserts 1 extra metadata schma
- Generating dataflow without inform input or
- xsd2er
- Possibility to set mixed=false in an inherited mixed=true type.
- xml2er / json2er / xsd2er
- Duplicated table names due case sensitiveness
- download.sh
- Root and non-root users are accessing the same download directory
FLEXTER 2.5
New features
- xml2err
- Support casting XML tags using xsi:type attribute.
- xsd2er
- Support associating xsi:type statistics with XSDs statistics.
- download.sh
- Support downloading dependencies: aws, azure, gcloud, snowflake and custom packages.
Fixes
- xml2er / json2er
- Numeric values with precision/scale higher than 38 were truncated or not supported by the output resource, now are considered as
VARCHAR
. - Making possible to change the level log particular parsing messages in the log4j.properties.
- Numeric values with precision/scale higher than 38 were truncated or not supported by the output resource, now are considered as
- xml2er
- Namespace-less tag name with type as been ignored by xml2er.
FLEXTER 2.4
New features
- xml2er / json2er
- Customizable integer/float output data types.
- Preventing table/column names with any oracle’s reserved words
- Ignoring jdbc input when the table isn’t informed and
-g,--map
parameter called. - Including the number of tables to be written in the log
- Improved the FK_ref_as algorithm to filling it with relative XPaths
Fixes
- Spark
- Multiple flexter.conf files cause spark crashes
- JDBC debug is failing after spark 3 upgrade
- JDBC dialects always been written as CLOB after Spark 3 upgrade
- Postgresql JDBC dialects inserts quotes except for comments
- Hive annoying WARN messages with Spark 3
- flexter-ui / res-api
- Rest api doesn’t support json2er and merge2er modules
- flexchma
- Calcmap reuse optimization cause table columns loss
- xml2er / json2er / xsd2er
- Omitted
--name-max-len
is truncating generated names bigger than 30
- Omitted
- xml2er / json2er
--sequence-type
isn’t enforced to reduce the numeric precision- failing to parse
-R "columnName=2001-01-01 00:00:00"
- xml2er
- XML tags with xsi:nil=“true” is detected as text tag
- xml2er doesn’t load
-i
parameter from a job
- json2er
- json2er isn’t working with mongodb
- json2er isn’t generating correct schema name with select clause
- xsd2er
- xsd2er isn’t locating the correct path in docker
- xsd2er
--stop-policy +0
produce same results as+1
FLEXTER 2.3
New features
- json2er
- Support to MongoDB as input source
- xml2er / json2er
- Support to MongoDB as output target
Fixes
- xml2er / json2er / xsd2er
- Numeric scale were stored as null in some cases
- xml2er / json2er
- Extra columns with regular expressions
FLEXTER 2.2
New features
- Spark
- Spark baseline version migrated to 3.1.x. It’s no longer compatible with Spark 2.x
- Default Spark upgraded to 3.1.2 version.
FLEXTER 2.1
New features
- xml2er / json2er
- Included
--sequence-type <SQLTYPE>
accepting both VARCHAR and NUMERIC(precision, scale) and other numeric variants_, integer, long_… - Included
--console <s|p>
shortcut to call flexter application as console functions: exxml2er()
- Included
- Spark
- Default Spark upgraded to 2.4.7 version.
Fixes
- flexchma
- Calcmap - Preventing generating table/colum names starting with _ and numbers
FLEXTER 2.0
New features
- xml2er / xsd2er
- More accurate namespace + xpath analysis
- flexchma
- Flexter Schema now can be installed and upgraded by command-line
Fixes
- xml2er
- Extra hidden characters are being filtered in XML documents
- xsd2er
- Detecting xs:nil tags as data column without sample
FLEXTER 1.10
New features
- Yellobrick
- Support to Yellobrick data warehouse
- Google Cloud
- Support to Google BigQuery data warehouse and Google Storage
- xml2er / json2er / xsd2er
- Included
--default-varchar-len
parameter - Changed
--use-stats
parameter now has the-a
shortcut
- Included
FLEXTER 1.9
New features
- Spark
- More compatibility between hive tables and other hive based tools for orc formats.
- xml2er / json2er
- Included
--namemode
to enforce lower, upper or camel case table/column names.
- Included
- json2er
- Accepting json fields with spaces, slashes and other special chars.
- xml2er / json2er / xsd2er / merge2er
- Logging into files throughout log4j
- Included
--license
parameter to load it externally
- AWS
- Support to Redshift JDBC connections
- Experimental support for Redshift Spark Connector
FLEXTER 1.8
New features
- xml2er / json2er
- Included –extra-column and –partition-column replacing –partition-by and –partition-by-name parameters.
- Improved –column and –id-column to accept expressions, casting and aliases.
- Included –prefix and –suffix at output table names.
- Included –rename to be able to rename output table names.
- merge2er
- Enabling copying –constraints from jdbc to jdbc tables in the merge2er as an experimental feature.
FLEXTER 1.7
New features
- Spark
- Default Spark upgraded to 2.4.3 version.
- Extending Spark support between 2.1.x to 2.4.x versions.
- Included other spark-submit parameters into bash script launcher for kerberos authentication (
--queue
,--principal
,--keytab
)
- xml2er / json2er
- Printing the output tables' DDL out to the console for Hive or Jdbc dialect
- Writing tables' schema directly to Hive or Jdbc database
- Included two further options to disable writing document statistics and xpath statistics
- xsd2er
- The xsd2er module will keep track of stats used for schema generation
- merge2er
- The merge2er module now works in skip mode and is capable to import SQL query result.
- Added support for jdbc input/output targets
Fixes
- xml2er / json2er / merge2er
- Removed table exists SQL checks from the logs when using the
-c
parameter - Disabled quotation for generic JDBC driver, while enabled for specific ones
- Removed table exists SQL checks from the logs when using the
- xml2er / json2er
- Processing dates with different formats (eg. Day-Month-Year, Month-Day-Year …)
- xml2er
- Mixed flag is ignored if not explicitly set in stats
- merge2er
- Reviewed merge2er log messages and default output format