Configuration
Path format (URI standard)
The URI path format is used when specifying a connection, i.e. either the connection to the Metadata DB, the Source Connection, or the Target Connection. The path format is based on the URI (Uniform Resource Identifier) convention, which accepts the protocol, hostname, port and internal path as per the pattern below:
protocol://user:password@hostname:port/path?param=value#param=value
It also accepts suppressing all options, or using relative paths. Such as the examples below:
# Current directory file
file.zip
# Absolute directory
/tmp/directory
# Absolute directory in windows style
C:\Temp\directory
# Absolute directory in unix style
/c:/Temp/directory
# HTTPS zip file
https://hostname:8443/file.zip
# HDFS directory
hdfs://host:8020/tmp/directory
# Enforce the local filesystem
file:/tmp/directory
# JDBC
jdbc:postgresql://hostname:port/database
Some of these URIs may require a username and password, which could be included in the URI (not recommended) or using a configuration file.
Common Configuration Location
Global configuration settings can be stored in a configuration file. One example for a global configuration setting is the connection path to the Metadata database.
Note the various settings are applied in the specified precedence giving the current folder highest priority of all configuration files, which only the parameters specified in the command line can override.
1.Current folder:
./flexter_local.conf
2.User home folder:
~/.config/flexter/flexter_user.conf
%LOCALDATA%/flexter/flexter_user.conf
3.Installation folder:
/etc/flexter/conf/flexter.conf
%ProgramFiles(x86)%/flexter/conf/flexter.conf
Module Configuration Location
Configurations that are specific to a particular module are stored in separate configuration files for xml2er, json2er, merge2er or xsd2er
The same precedence rules as for the global configuration apply.
1.Current folder (respectively):
./xml2er_local.conf
./json2er_local.conf
./xsd2er_local.conf
./merge2er_local.conf
2.User home folder (respectively):
~/.config/flexter/xml2er_user.conf
~/.config/flexter/json2er_user.conf
~/.config/flexter/xsd2er_user.conf
~/.config/flexter/merge2er_user.conf
%LOCALDATA%/flexter/xml2er_user.conf
%LOCALDATA%/flexter/json2er_user.conf
%LOCALDATA%/flexter/xsd2er_user.conf
%LOCALDATA%/flexter/merge2er_user.conf
3.Installation folder (respectively):
/etc/flexter/xml2er/xml2er.conf
/etc/flexter/json2er/json2er.conf
/etc/flexter/xsd2er/xsd2er.conf
/etc/flexter/merge2er/merge2er.conf
%ProgramFiles(x86)%/flexter/xml2er/conf/xml2er.conf
%ProgramFiles(x86)%/flexter/json2er/conf/json2er.conf
%ProgramFiles(x86)%/flexter/xsd2er/conf/xsd2er.conf
%ProgramFiles(x86)%/flexter/merge2er/conf/merge2er.conf
Common Configuration (.config) file format
The configuration files are based on the HOCON format.
### METADATA ###
metadata {
path="jdbc:postgresql:x2er"
user="flex2er"
password="flexter"
batchsize=1000
nameMaxLen = 30
partitions = 0
}
### INPUT ###
input {
path=""
user=""
password=""
batchsize=1000
archiveRead=true
byteStream=false
filterContent=true
detectType=true
partitions = 0
stream = ""
table = ""
column = null
where = ""
limit = 0
idColumn = null
idMin = 0
idMax = 0
format = ""
newSince = ""
options {
}
token = ""
hasHeader=true
# merge2er
schema = ""
schemaPattern = ""
tablePattern = ""
# xsd2er
root = ""
unref-root = false
file-root = ""
unref-file-root = false
with-target = false
}
### OUTPUT ###
output {
path=""
user=""
password=""
batchsize=1000
partitions = 0
format = ""
compression = ""
prefix = ""
suffix = ""
savemode = ""
hiveCreate = false
schema = ""
resetPKs = false
unifiedFKs = false
partitionColumns = null
extraColumns = null
constraints = false
namemode = ""
tablesAtOnce = 0
defaultVarcharLen = 0
sequenceType = null
defaultFloatType = null
defaultIntType = null
renames {
}
options {
}
token = ""
}
### ACTION ###
action {
skip=false
commands=false
showConfig=true
loadConfig=true
parsemode = "all"
# xsd2er
xpath = false
xpath-full = false
levels = 1
levels-type = 2
stop-policy = "+"
}
## SCHEMA ###
schema = {
origin=0
schemaName=""
}
## SPARK ###
spark {
name = ""
master = "local[*]"
blocksize = 0
cache = ""
warehouseDir = ""
checkpointDir = ""
flexter_metrics = ""
}
Parameters
You can print out a description of the parameters that are available by calling the respective command line module with the –help switch, e.g. xml2er –help
--help Flexter parameters help
--version Show flexter version
-c, --commands Show the SQL command when they are
available.
-H Help for ShellScript launcher command
-v Shows the built launcher command
<args1> -- <args2> Enforces sending args1 to launcher command and args2 to flexter
Below is a list of some common commands that can be used with modules xml2er, json2er, and xsd2er
Usage: xml2er|json2er|xsd2er|merge2er [OPTIONS] INPUTPATH
-h, --help
--version Prints the version number
Options:
# schema
-x, --schema-origin Schema Origin ID (Stats or XSD)
-l, --schema-logical Data Flow ID
-O, --org Organization ID
-j, --job Job ID
# metadata database
-m, --meta METADATA JDBC location of metadata store
default: jdbc:postgresql:x2er
-M, --meta-user USER Metadata user. default: flex2er
-w, --meta-password PASSWORD Metadata password
# input
-U, --in-user USER Input user
-P, --in-password PASSWORD Input password
# output
-o, --out OUTPUT Output - Target Connection
-u, --out-user USER Output user
-p, --out-password PASSWORD Output password
# actions
-c, --commands Show SQL commands. For debugging. Writes SQL commands Flexter generates to log
-s, --skip Skip writing results
--metrics DURATION Enabling metrics printing each: 1h, 1m, 1s…
There is also the bash script launcher help, which you could invoke with -H
parameter, e.g. xml2er -H
Usage: [options]
-H | print this message
-v | --verbose this runner is chattier
-d | --debug set sbt log level to debug
-L | --log set the log4j configuration file
--license <location> License file location
--jvm-debug <port> Turn on JVM debugging, open at the given port.
--console <s|p> Spark console for Scala or Python
# spark-submit shortcuts
--spark-home <path> alternate SPARK_HOME
--master <master> alternate spark master
--deploy-mode <dm> spark submit deploy mode
--mem <memory> driver and executors memory
--cores <cores> driver and executors cores
--exec <executors> number of executors
--pkg spark external packages
--queue <queue_name> The YARN queue to submit to (Default: "default").
--principal <principal> Principal to be used to login to KDC, while running on secure
HDFS.
--keytab <keytab> The full path to the file that contains the keytab for the
principal specified above. This keytab will be copied to
the node running the Application Master via the Secure
Distributed Cache, for renewing the login tickets and the
delegation tokens periodically.
# jvm options and output control
JAVA_OPTS environment variable, if unset uses ""
-Dkey=val pass -Dkey=val directly to the java runtime
-J-X pass option -X directly to the java runtime
(-J is stripped)
# special option
-- To stop parsing built-in commands from the rest of the command-line.
e.g.) enabling debug and sending -d as app argument
$ ./start-script -d -- -d
In the case of duplicated or conflicting options, basically the order above
shows precedence: JAVA_OPTS lowest, command line options highest except "--".