Features
Features of Flexter:
● Convert XML to a relational Target Schema (XML2ER module).
● Convert JSON to relational Target Schema (JSON2ER module).
● Convert XML & XSD to relational Target Schema (XSD2ER module).
● Optimisation algorithms to simplify Target Schema.
● Load text files to database (MERGE2ER module from Flexter 1.7).
● Import SQL query result for data conversion (MERGE2ER module from Flexter 1.7).
● Convert data from one database to another database (MERGE2ER module from Flexter 1.7) .
● Merge small files (MERGE2ER module).
● Visualize the diagramas and schema leanage with FLEXTER-UI module
Input
- XML/JSON on FTP/SFTP, HTTP/HTTPS, Network folder, Hadoop-FS, Amazon S3, Google Storage
- Inside of tabular file columns - CSV/TSV/PSV, Parquet, ORC, Avro
- Inside of databases columns - BLOB/VARCHAR/CLOB/XMLTYPE/JSON, Oracle, Postgres, MySQL, MSSQL, Derby, Hive, BigQuery, Snowflake, Teradata, Yellowbrick
Output
- Network folder, Hadoop-FS, Amazon S3, Google Storage
- CSV/TSV/PSV, Parquet, JSON, ORC, Avro
- Oracle, Postgres, MySQL, MSSQL, Derby, Hive, BigQuery, Snowflake, Teradata, Yellowbrick
Automation
- Auto-generates the target output schema and mappings to it based on XSD (normalised relational format).
- Auto-generates the target output schema and mappings to it based on statistics taken over XML/JSON sample (normalised relational format).
- Auto-generates the table relationships of output and primary and foreign keys. Execution engine automatically populates the output schema.
Optimisations
By using various schema optimisation algorithms, Flexter makes the target schema more compact and easier to work with for downstream analytics, e.g. it eliminates duplicate and redundant data points or intelligently re-assigns data points to entities.
Loops or multiple passes over XML/JSON files are eliminated. In most ETL tools developers have to iterate many times over the same XML/JSON file to free the data. With Flexter, each XML/JSON file is loaded once and shredded into its data components.
Performance
Flexter is built on top of Spark, a distributed compute framework. It also scales linearly and can handle any data volume.
Processing can happen in-memory for faster throughput.
Schema and Lineage Browser
Data analysts can use Flexter to visually browse the input and optimised output schemas. They can also browse the lineage and mappings between source and target schemas.