Configuration

This section gives an idea of Flexter configurations and settings which are recommended for usage.

Path format (URI standard)

The URI path format is used when specifying a connection, i.e. either the connection to the Metadata DB, the Source Connection, or the Target Connection. The path format is based on the URI (Uniform Resource Identifier) convention, which accepts the protocol, hostname, port and internal path as per the pattern below:

protocol://user:password@hostname:port/path?param=value#param=value

It also accepts suppressing all options, or using relative paths. Such as the examples below:

# Current directory file
file.zip

# Absolute directory
/tmp/directory

# Absolute directory in windows style
C:\Temp\directory

# Absolute directory in unix style
/c:/Temp/directory

# HTTPS zip file
https://hostname:8443/file.zip

# HDFS directory
hdfs://host:8020/tmp/directory

# Enforce the local filesystem
file:/tmp/directory

# JDBC
jdbc:postgresql://hostname:port/database

Some of these URIs may require a username and password, which could be included in the URI (not recommended) or using a configuration file.

Common Configuration Location

Global configuration settings can be stored in a configuration file. One example for a global configuration setting is the connection path to the Metadata database.

Note the various settings are applied in the specified precedence giving the current folder highest priority of all configuration files, which only the parameters specified in the command line can override.

1.Current folder: 
./flexter.conf 

2.User home folder: 
~/.config/flexter/flexter.conf
%LOCALDATA%/flexter/flexter.conf 

3.Installation folder: 
/etc/flexter/conf/flexter.conf
%ProgramFiles(x86)%/flexter/conf/flexter.conf

Module Configuration Location

Configurations that are specific to a particular module are stored in separate configuration files for xml2er, json2er, merge2er or xsd2er

The same precedence rules as for the global configuration apply.

1.Current folder (respectively): 
./xml2er.conf
./json2er.conf
./xsd2er.conf
./merge2er.conf

2.User home folder (respectively): 
~/.config/flexter/xml2er.conf 
~/.config/flexter/json2er.conf 
~/.config/flexter/xsd2er.conf 
~/.config/flexter/merge2er.conf 
%LOCALDATA%/flexter/xml2er.conf 
%LOCALDATA%/flexter/json2er.conf 
%LOCALDATA%/flexter/xsd2er.conf 
%LOCALDATA%/flexter/merge2er.conf 

3.Installation folder (respectively): 
/etc/flexter/xml2er/xml2er.conf 
/etc/flexter/json2er/json2er.conf 
/etc/flexter/xsd2er/xsd2er.conf 
/etc/flexter/merge2er/merge2er.conf 
%ProgramFiles(x86)%/flexter/xml2er/conf/xml2er.conf
%ProgramFiles(x86)%/flexter/json2er/conf/json2er.conf
%ProgramFiles(x86)%/flexter/xsd2er/conf/xsd2er.conf
%ProgramFiles(x86)%/flexter/merge2er/conf/merge2er.conf

Common Configuration (.config) file format

The configuration files are based on the HOCON format.

### METADATA ###
metadata {
  path="jdbc:postgresql:x2er"
  user="flex2er"
  password="flexter"
  batchsize=1000
  nameMaxLen = 30
  varcharLen = 0
  partitions = 0
}

### INPUT ###
input {
  path=""
  user=""
  password=""
  batchsize=1000
  archiveRead=true
  byteStream=false
  filterContent=true
  detectType=true
  partitions = 0
  stream = ""
  table = ""
  column = null
  where = ""
  limit = 0
  idColumn = null
  idMin = 0
  idMax = 0
  format = ""
  newSince = ""
  options {
  }
  token = ""
  hasHeader=true
  # merge2er
  schema = ""
  schemaPattern = ""
  tablePattern = ""
  # xsd2er
  root = ""
  unref-root = false
  file-root = ""
  unref-file-root = false
  with-target = false
}

### OUTPUT ###
output {
  path=""
  user=""
  password=""
  batchsize=1000
  partitions = 0
  format = ""
  compression = ""
  prefix = ""
  suffix = ""
  savemode = ""
  hiveCreate = false
  schema = ""
  resetPKs = false
  unifiedFKs = false
  partitionColumns = null
  extraColumns = null
  constraints = false
  namemode = ""
  tablesAtOnce = 0
  renames {
  }
  options {
  }
  token = ""
 }	

### ACTION ###
action {
  skip=false
  commands=false
  showConfig=true
  loadConfig=true
  parsemode = "all"
  # xsd2er
  xpath = false
  xpath-full = false
  levels = 1
  levels-type = 2
  stop-policy = "+"
}

## SCHEMA ###
schema = {
  origin=0
  schemaName=""
}

## SCHEMA ###
spark {
  name = ""
  master = "local[*]"
  blocksize = 0
  cache = ""
  warehouseDir = ""
  checkpointDir = ""
  flexter_metrics = ""
}

Parameters

You can print out a description of the parameters that are available by calling the respective command line module with the –help switch, e.g. xml2er –help

--help                      	Flexter parameters help

--version                   	Show flexter version

-c, --commands      		Show the SQL command when they are
                                available.

-H                          	Help for ShellScript launcher command

-v                          	Shows the built launcher command

<args1> -- <args2>         	Enforces sending args1 to launcher command and args2 to flexter

Below is a list of some common commands that can be used with modules xml2er, json2er, and xsd2er

Usage: xml2er|json2er|xsd2er|merge2er [OPTIONS] INPUTPATH

  -h, --help
  --version                    Prints the version number

  Options:
  # schema
  -x, --schema-origin          Schema Origin ID (Stats or XSD)
  -l, --schema-logical         Data Flow ID
  -O, --org                    Organization ID
  -j, --job                    Job ID

  # metadata database
  -m, --meta METADATA          JDBC location of metadata store 
                               default: jdbc:postgresql:x2er
  -M, --meta-user USER         Metadata user. default: flex2er
  -w, --meta-password PASSWORD Metadata password

  # input
  -U, --in-user USER           Input user
  -P, --in-password PASSWORD   Input password

  # output
  -o, --out OUTPUT             Output - Target Connection
  -u, --out-user USER          Output user
  -p, --out-password PASSWORD  Output password

  # actions
  -c, --commands               Show SQL commands. For debugging. Writes SQL commands Flexter generates to log
  -s, --skip                   Skip writing results
  --metrics DURATION           Enabling metrics printing each: 1h, 1m, 1s…

There is also the bash script launcher help, which you could invoke with -H parameter, e.g. xml2er -H

Usage:  [options]

-H |                    print this message
-v | --verbose          this runner is chattier
-d | --debug            set sbt log level to debug
-L | --log              set the log4j configuration file
--license <location>    License file location
--jvm-debug <port>      Turn on JVM debugging, open at the given port.

# spark-submit shortcuts
--spark-home <path>     alternate SPARK_HOME
--master <master>       alternate spark master
--deploy-mode <dm>      spark submit deploy mode
--mem <memory>          driver and executors memory
--cores <cores>         driver and executors cores
--exec <executors>      number of executors
--pkg                   spark external packages
--queue <queue_name>    The YARN queue to submit to (Default: "default").
--principal <principal> Principal to be used to login to KDC, while running on secure
                      HDFS.
--keytab <keytab>       The full path to the file that contains the keytab for the
                        principal specified above. This keytab will be copied to
                        the node running the Application Master via the Secure
                        Distributed Cache, for renewing the login tickets and the
                        delegation tokens periodically.

# jvm options and output control
JAVA_OPTS          environment variable, if unset uses ""
-Dkey=val          pass -Dkey=val directly to the java runtime
-J-X               pass option -X directly to the java runtime
                   (-J is stripped)

# special option
--                 To stop parsing built-in commands from the rest of the command-line.
                   e.g.) enabling debug and sending -d as app argument
                   $ ./start-script -d -- -d

In the case of duplicated or conflicting options, basically the order above
shows precedence: JAVA_OPTS lowest, command line options highest except "--".