Release Notes
Release notes from the previous versions of Flexter.
FLEXTER 2.12
New features
- Spark
- Default Spark upgraded to 3.5.6 version
 
 - Snowflake
- Enabling 
--in-passwordand--out-passwordparameters for private key authentication 
 - Enabling 
 - xml2er / xsd2er / json2er/ merge2er
- Included 
--store-credentialsparameter to enable or disable storing credentials in the metadata database, disabled by default 
 - Included 
 - flexter-docker / flexter-db
- Ported to use a light-weight official alpine linux postgres image
 - Default Postgresql version upgraded from 15 to 17
 - Default Java upgraded to 17 version
 - Included azcopy command-line tool for flexchma
 
 - flexter-docker / flexter-cmd
- Ported to support Kubeflow Spark Operator using the official Spark image base
 - Default Java upgraded to 17 version
 - Default Spark upgraded to 3.5.6 version
 - Included spark-snowflake 3.1.5 version
 - Included spark-hadoop-cloud containing: hadoop-aws, hadoop-azure, hadoop-openstack, hadoop-aliyun, gcs-connector
 - Included azcopy command-line tool for flexchma
 - Adapted to support docker’s apache/spark base images from 3.1.3 to 3.5.x
 
 - flexter-ui
- Included optional Azure Active Directory authentication
 - Included import/export schema buttons
 
 
Improvements
- flexter-ui / rest-api
- Masking credentials
 
 - xsd2er
- support to 
unionandlistlength and digits calculations - Enabled the 
--default-varchar-lenparameter being set permanently with--map, -gparameter 
 - support to 
 - download.sh
- Improved the download speed using apache dynamic mirrors
 
 
Fixes
- xml2er / json2er
- Fixed decimal precision and scale statistics calculation for spread int digits and fractional digits
 - Fixed 
--out-optparameter not being set for output file formats - Fixed 
--default-varchar-lenshould avoid overriding columns with defined lengths 
 - xml2er / json2er/ merge2er / xsd2er
- Fixed job table 
paramsandparams_jsoncolumns missing parameters 
 - Fixed job table 
 - flexter-ui
- Fixed sequence names when flexter-ui started before flexter-db had finished its table creation
 - Fixed static web files not being attached during the release process
 - Fixed import/export button for postgresql URLs omitting port and/or host
 - Fixed loading max_occurs
 
 - flexchma
- Function 
compare_mappingsnow treats duplicated FKs 
 - Function 
 - flexter-docker / flexter-cmd
- Fixed its initialization via mounted 
flexter.conffile 
 - Fixed its initialization via mounted 
 
FLEXTER 2.11
New features
- Java
- Extending Java support between 8 and 17 versions
 
 - Spark
- Extending Spark support between 3.1.x and 3.5.x versions
 - Default Spark upgraded to 3.5.2 version
 
 - xsd2er
- Included 
--xpath-phase <unit|def|all>parameter to help to print out XPaths in different XPath build phases 
 - Included 
 - xml2er / json2er
- Included 
--ignore-mixed-contentparameter to avoid parsing xml tags or json values with mixed content - Included 
--parallel-sequenceparameter to enforce sequences for parallel environments like clusters. 
 - Included 
 - xml2er
- Included 
--parse-lib <LIB>enabling different SAX parser implementations beyond the SAX parser shipped with Java JDK 
 - Included 
 - flexter-ui / rest-api
- Included endpoint to terminate active jobs
 
 
Improvements
- xml2er / json2er
- Performance improvements while processing statistics
 - Performance improvements while processing mixed content
 - Memory consumption improvements between parsing and caching data
 
 - flexchma / calcmap
- The order of the table’s columns is now also determined by the numeric suffixes of field names
 
 - flexchma / migration.py
- The export utility consolidates the specified schema and all its parent schemas, ensuring that only the requested model is included
 
 - flexter-ui
- Keeping track in the processed ids enabling check if the processes still alive
 
 - xsd2er
- Improved dependency issues messages, testing all possible issues and compiling a list of issues
 - Included warning messages when root tags match between stats and XSDs, when one side has empty namespace
 
 
Fixes
- xml2er / json2er / merge2er
- Enhanced the maximum decimal precision and scale for BigQuery from 19,9 to 38,38 (maximum accepted by spark) based on recent BigQuery improvements
 - Included 
quoteIdentifierparameter forjdbc:postgresql://URI enabling optionally double-quotes in identifiers - Oracle CLOB columns wasn’t accepting text above 4000 length.
 
 - xml2er / json2er
- Removed 
--tables-at-oncerestriction policy to run only in java 8 - Truncating text content bigger than 16mb for Snowflake, avoiding process crashes
 - Fixed 
--id-column <name> extraand--extra-column <name> idparameter cases, which injects an extra column based on input’s ID value - Numeric data types statistics detection was allowing precision and scale beyond limits.
 - Fixed sequences in Databricks environments where the Spark Cluster is kept alive across multiple jobs.
 
 - Removed 
 - json2er
- Fix for single and empty arrays, they weren’t detected as 1:N
 
 - xsd2er
- Fixed false positive mixed content assumption, for abstract xsd types without content
 - Fix for bi-directional file reference cases, which was causing some failed links and performance loss
 - Fix to handle prefixes larger than 30 characters, now it supports up to 255 characters
 - Fix to bring all historical stats linked with the schema origin, not only the ones informed by –use-stats parameter
 - Fix combined stats with already processed data flow/mapping were casing failures
 
 - flexter-ui / merge2er
- Making new merge2er jobs listed in the flexter-ui jobs list
 
 - flexter-ui
- Removing jobs list Spark History Server button if it isn’t set
 - Fixed populating multiple input parameters in job table
 
 - flexchma / migration.py
- Export schemas wasn’t including id_du in the exported du_stat tables.
 
 - flexchma / calcstruct
- Stats consolidation should not preserve data_units from historical schemas
 
 - log4j
- File appender implemented for log4j2 (spark 3.3-3.5) as it was for log4j1 (spark 3.1-3.2)
 - Fixed print out commands parameter 
-cfor log4j2 (spark 3.3-3.5) - Fixed parameter 
-Lfor log4j2 environments Spark 3.3 and higher - Including the starting log events in log files that use job id in the name.
 
 
FLEXTER 2.10
New features
- Spark
- Default Spark upgraded to 3.3.4 version.
 
 - download.sh
- Included option to download spark-hadoop-cloud dependency
 - Included option to download Hadoop Aliyun Cloud dependencies
 - Included option to download Hadoop Tencent OSS Cloud dependencies
 - Included option to download Hadoop Open Stack Cloud dependencies
 - Included option to download Hadoop Huawei Cloud OBS dependencies
 - Separated Google Cloud Storage and Google Big Query options
 - Include option to download separated hive packages
 
 - xml2er / json2er
- Included 
--remap-tablesand--remap-table <TABLE,...>parameters to reorder table columns based tables found in the output 
 - Included 
 
Fixes
- xml2er / json2er / merge2er
- Fix in the HDFS blocksize, which Flexter was enforcing always 0 after Spark 3.3 changes.
 
 - xml2er / json2er
- Fix the Spark application name which was appearing with only (…) instead xm2er (…) or json2er (…).
 
 - xml2er
- Fix performance issues for large files containing mixed content, ex: HTML, Formated text and other tags mixed with text cases.
 
 
FLEXTER 2.9
New features
- flexchma / migration.py
- By default, the export utility consolidates schema’s data before exporting it. This behaviour can be disabled passing the optional 
--export-fullparameter - Bulk copy when importing previously extracted schemas
 - Logging improvements, collecting the full mapping with the list of new schema ids and their original ones
 
 - By default, the export utility consolidates schema’s data before exporting it. This behaviour can be disabled passing the optional 
 - flexchma / db
- New ad-hoc function for stats consolidation.
 - New accessory function (compact_stats) to consolidate stats and persisting them, with an optional switch to purge obsolete entries
 
 - flexter-ui / rest-api
- Included endpoint to call import/export metadata.
 
 
FLEXTER 2.8
Fixes
- xml2er / xsd2er
- Fix in xsi:type cases with missing default types.
 
 - flexter-ui
- Masking passwords sent by parameter.
 
 - xml2er
- Fix attributes of xsi:type tags in the statistics process.
 
 
FLEXTER 2.7
New features
- Spark
- Extending Spark support between 3.1.x and 3.3.x versions.
 
 - Docker
- Support to Kubernetes environments.
 
 
Improvements
- Databricks
- Support delta tables merge schema feature.
 
 
Fixes
- Spark
- Fix dynamic log settings loading for Spark 3.3.
 - Fix loading 
--confparameters from the application. 
 - flexchma
- Fix generating schema from previous could cause false positive mixed content cases.
 
 - Databricks
- Job error status detection
 
 - Cloudera
- Fix kyro serialization
 
 - xml2er / json2er
- Fix date formats with 3 character Months like Jan, Feb, Mar…
 
 
FLEXTER 2.6
New features
- Spark
- Extending Spark support between 3.1.x and 3.2.x versions.
 
 - xml2er
- Including the support for xsi:type stats only cases.
 
 
Improvements
- xml2er / xsd2er
- Pattern matching between XML stats and XSDs.
 
 - docker
- Changes to support non-root users for kubernetes.
 
 - databricks
- Support to load flexter and log4j settings.
 
 
Fixes
- xml2er
- Detecting recursive tables generated by reuse optimization algorithm.
 
 - Snowflake
- Including truncation for 16+ kilobytes text data.
 - JDBC - VARIANT switched to VARCHAR to avoid 16+ kilobytes issues.
 
 - json2er
- Fixed null values were treated ignored column nullable definition.
 
 - Cloudera
- Removing Spark 3.1 and 3.2 verbose logging.
 
 - Spark
- Fixed spark dependencies downloads.
 
 - xml2er / json2er
- Generating dataflow without inform input or 
-xparameter: exxml2er -a123 -g1. - XPaths, tables and columns can be disabled in the metadata database.
 - The parameter 
--default-varchar-lendoesn’t take effect - Calling with 
-a <id>and-g1inserts 1 extra metadata schma 
 - Generating dataflow without inform input or 
 - xsd2er
- Possibility to set mixed=false in an inherited mixed=true type.
 
 - xml2er / json2er / xsd2er
- Duplicated table names due case sensitiveness
 
 - download.sh
- Root and non-root users are accessing the same download directory
 
 
FLEXTER 2.5
New features
- xml2err
- Support casting XML tags using xsi:type attribute.
 
 - xsd2er
- Support associating xsi:type statistics with XSDs statistics.
 
 - download.sh
- Support downloading dependencies: aws, azure, gcloud, snowflake and custom packages.
 
 
Fixes
- xml2er / json2er
- Numeric values with precision/scale higher than 38 were truncated or not supported by the output resource, now are considered as 
VARCHAR. - Making possible to change the level log particular parsing messages in the log4j.properties.
 
 - Numeric values with precision/scale higher than 38 were truncated or not supported by the output resource, now are considered as 
 - xml2er
- Namespace-less tag name with type as been ignored by xml2er.
 
 
FLEXTER 2.4
New features
- xml2er / json2er
- Customizable integer/float output data types.
 - Preventing table/column names with any oracle’s reserved words
 - Ignoring jdbc input when the table isn’t informed and 
-g,--mapparameter called. - Including the number of tables to be written in the log
 - Improved the FK_ref_as algorithm to filling it with relative XPaths
 
 
Fixes
- Spark
- Multiple flexter.conf files cause spark crashes
 - JDBC debug is failing after spark 3 upgrade
 - JDBC dialects always been written as CLOB after Spark 3 upgrade
 - Postgresql JDBC dialects inserts quotes except for comments
 - Hive annoying WARN messages with Spark 3
 
 - flexter-ui / res-api
- Rest api doesn’t support json2er and merge2er modules
 
 - flexchma
- Calcmap reuse optimization cause table columns loss
 
 - xml2er / json2er / xsd2er
- Omitted 
--name-max-lenis truncating generated names bigger than 30 
 - Omitted 
 - xml2er / json2er
--sequence-typeisn’t enforced to reduce the numeric precision- failing to parse 
-R "columnName=2001-01-01 00:00:00" 
 - xml2er
- XML tags with xsi:nil=“true” is detected as text tag
 - xml2er doesn’t load 
-iparameter from a job 
 - json2er
- json2er isn’t working with mongodb
 - json2er isn’t generating correct schema name with select clause
 
 - xsd2er
- xsd2er isn’t locating the correct path in docker
 - xsd2er 
--stop-policy +0produce same results as+1 
 
FLEXTER 2.3
New features
- json2er
- Support to MongoDB as input source
 
 - xml2er / json2er
- Support to MongoDB as output target
 
 
Fixes
- xml2er / json2er / xsd2er
- Numeric scale were stored as null in some cases
 
 - xml2er / json2er
- Extra columns with regular expressions
 
 
FLEXTER 2.2
New features
- Spark
- Spark baseline version migrated to 3.1.x. It’s no longer compatible with Spark 2.x
 - Default Spark upgraded to 3.1.2 version.
 
 
FLEXTER 2.1
New features
- xml2er / json2er
- Included 
--sequence-type <SQLTYPE>accepting both VARCHAR and NUMERIC(precision, scale) and other numeric variants_, integer, long_… - Included 
--console <s|p>shortcut to call flexter application as console functions: exxml2er() 
 - Included 
 - Spark
- Default Spark upgraded to 2.4.7 version.
 
 
Fixes
- flexchma
- Calcmap - Preventing generating table/colum names starting with _ and numbers
 
 
FLEXTER 2.0
New features
- xml2er / xsd2er
- More accurate namespace + xpath analysis
 
 - flexchma
- Flexter Schema now can be installed and upgraded by command-line
 
 
Fixes
- xml2er
- Extra hidden characters are being filtered in XML documents
 
 - xsd2er
- Detecting xs:nil tags as data column without sample
 
 
FLEXTER 1.10
New features
- Yellobrick
- Support to Yellobrick data warehouse
 
 - Google Cloud
- Support to Google BigQuery data warehouse and Google Storage
 
 - xml2er / json2er / xsd2er
- Included 
--default-varchar-lenparameter - Changed 
--use-statsparameter now has the-ashortcut 
 - Included 
 
FLEXTER 1.9
New features
- Spark
- More compatibility between hive tables and other hive-based tools for orc formats.
 
 - xml2er / json2er
- Included 
--namemodeto enforce lower, upper or camel case table/column names. 
 - Included 
 - json2er
- Accepting json fields with spaces, slashes and other special chars.
 
 - xml2er / json2er / xsd2er / merge2er
- Logging into files throughout log4j
 - Included 
--licenseparameter to load it externally 
 - AWS
- Support to Redshift JDBC connections
 - Experimental support for Redshift Spark Connector
 
 
FLEXTER 1.8
New features
- xml2er / json2er
- Included –extra-column and –partition-column replacing –partition-by and –partition-by-name parameters.
 - Improved –column and –id-column to accept expressions, casting and aliases.
 - Included –prefix and –suffix at output table names.
 - Included –rename to be able to rename output table names.
 
 - merge2er
- Enabling copying –constraints from jdbc to jdbc tables in the merge2er as an experimental feature.
 
 
FLEXTER 1.7
New features
- Spark
- Default Spark upgraded to 2.4.3 version.
 - Extending Spark support between 2.1.x and 2.4.x versions.
 - Included other spark-submit parameters into bash script launcher for kerberos authentication (
--queue,--principal,--keytab) 
 - xml2er / json2er
- Printing the output tables' DDL out to the console for Hive or Jdbc dialect
 - Writing tables' schema directly to Hive or Jdbc database
 - Included two further options to disable writing document statistics and xpath statistics
 
 - xsd2er
- The xsd2er module will keep track of stats used for schema generation
 
 - merge2er
- The merge2er module now works in skip mode and is capable to import SQL query result.
 - Added support for jdbc input/output targets
 
 
Fixes
- xml2er / json2er / merge2er
- Removed table exists SQL checks from the logs when using the 
-cparameter - Disabled quotation for generic JDBC driver, while enabled for specific ones
 
 - Removed table exists SQL checks from the logs when using the 
 - xml2er / json2er
- Processing dates with different formats (eg. Day-Month-Year, Month-Day-Year …)
 
 - xml2er
- Mixed flag is ignored if not explicitly set in stats
 
 - merge2er
- Reviewed merge2er log messages and default output format