download.sh
This module is unnecessary if the environment already have spark installed, like:
- AWS EMR
- MapR distributions
- HortonWorks distributions
- Cloudera distributions
- Custom installations
- Docker based containers and Kubernetes
Parameters
To get all the supported download.sh parameters the following command can be used:
$ /usr/share/flexter/sbin/download.sh -h
or
$ ~/.local/share/flexter/sbin/download.sh -h
All the compatible paramters that can be used are listed:
Usage: /usr/share/flexter/sbin/download.sh [OPTIONS] | <spark|hadoop|hive|dep> [OPTIONS]
-h print this message.
-f Force switch/installation.
-r Force re-installation.
-c CI/CD mode, preventing progress messages.
-v <SPARK_VERSION> Spark version
default: 3.5.2
-b [hadoop]<SPARK_BIN> Spark binary package option.
Ex: hadoop3.2, 3.2, hadoop3, 3 or without-hadoop
It also accepts to choose an specific hadoop version, which is downloaded
automatically with spark's without-hadoop package.
Ex: hadoop3.4.4, 3.4.4, hadoop3.2.0 or 3.2.0
default: hadoop3
-H <HADOOP_VERSION> Hadoop binary package option.
default: 3.3.6
-w Spark binary package without-hadoop.
-V Download hive dependencies for spark without-hadoop.
-e Download hive package aside.
-E <HIVE_VERSION> Download hive package aside version.
default: 2.3.9
-C Download Spark Hadoop Cloud dependencies.
includes: aws, azure, gcloud, openstack, tencent cos, aliyun and maybe more.
-a Download AWS dependencies (s3).
-z Download Azure dependencies (blob storage/datalake).
-s Download Snowflake dependencies (spark/jdbc).
-S <VERSION> Spark Snowflake dependency version.
-g Download Google Cloud Storage dependency package (with-dependencies).
-G <VERSION> Download Google Cloud Storage dependency version.
-q Download Google Cloud Big Query dependency.
-Q <VERSION> Download Google Cloud Big Query dependency version.
-l Download Aliyun OSS dependencies
-o Download Tencent Cloud COS dependencies
-k Download Open Stack Cloud dependencies
-i Download Huawei Cloud OBS dependencies
-p <GROUP:NAME:VERSION> Download custom package, comma separated. Same as "spark-submit --packages".
-R <REPOSITORY_URL> Extra repositories for custom packages, comma separated.
SUB-COMMANDS
spark [OPTIONS] Download only spark package.
hadoop [OPTIONS] Download only hadoop package.
hive [OPTIONS] Download only hadoop package.
dep [OPTIONS] Download only spark dependencies.
Examples
Ways to enforce an installation
Installing the default spark version, it there is no other version already installed. If there is a older version, or a external installation, it won’t do anything.
$ /usr/share/flexter/sbin/download.sh
Forcing installing the default spark version, even if there is a another version installed.
If the version is already installed, it will switch the /usr/share/flexter/spark/default
link to this version.
$ /usr/share/flexter/sbin/download.sh -f
Forcing reinstalling a new version, downloading the packages again, even if the same or other version is installed.
$ /usr/share/flexter/sbin/download.sh -r
Choosing a particular spark version
In some cases, the spark is released with some patches, or the server environment demands a particular spark version for other reasons.
In this case you can do as below:
$ /usr/share/flexter/sbin/download.sh -v 3.5.2
Choosing a particular spark binary packages
Spark packages in each version comes in different builds. Like the version 3.1.x has
- hadoop3.2: Embedded hadoop 3.2.x dependencies, the default.
- hadoop2.7: Embedded hadoop 2.7.x dependencies.
- without-hadoop: The spark will come without hadoop and hive at all. And it will download the default hadoop version and hive dependencies.
- hadoop<MAJOR>.<MINOR>.<PATCH>: The spark will come without hadoop and hive at all. And it will download a particular hadoop version and hive dependencies.
- <MAJOR>.<MINOR>.<PATCH>: Same as above.
In this case you can do as bellow:
$ /usr/share/flexter/sbin/download.sh -b hadoop3
If you choose the option without-hadoop
, it will trigger a separated hadoop package downloading and hive dependencies.
Below tou can see the default approach:
$ /usr/share/flexter/sbin/download.sh -b without-hadoop
or
$ /usr/share/flexter/sbin/download.sh -w
You can also define a particular hadoop version to download together with spark (without-hadoop package) and hive dependencies
$ /usr/share/flexter/sbin/download.sh -b hadoop3.3.6
or
$ /usr/share/flexter/sbin/download.sh -b 3.3.6
or
$ /usr/share/flexter/sbin/download.sh -w -H 3.3.6
Downloading dependencies
Some dependencies already comes with hadoop, like aws and azure ones. But for spark with hadoop embedded packages, these ones aren’t available.
In the case like the without-hadoop spark package, there is no hive dependencies.
The download.sh
is also capable to install other external dependencies that doesn’t come with spark and hadoop.
Spark Hadoop Cloud dependencies
The Spark Hadoop Cloud package includes many other hadoop file system compatible packages. Such as: aws, azure, gcloud, openstack, tencent cos, aliyun and maybe more.
$ /usr/share/flexter/sbin/download.sh -C
hive dependencies
It will be downloaded automatically if you had chosen to install the spark packages without hadoop, however, you can download it again using:
$ /usr/share/flexter/sbin/download.sh -V
hadoop-aws dependencies
They come with hadoop package when it is download separated from spark, but if you wish to have it without a separated hadoop, you can do as below:
$ /usr/share/flexter/sbin/download.sh -a
hadoop-azure dependencies
They come with hadoop package when it is download separated from spark, but if you wish to have it without a separated hadoop, you can do as below:
$ /usr/share/flexter/sbin/download.sh -z
GCLoud dependencies
The latest Google Cloud Storage dependency can be downloaded with the command below:
$ /usr/share/flexter/sbin/download.sh -g
However, if a particular Google Cloud Storage version is required, it can be defined as below:
$ /usr/share/flexter/sbin/download.sh -G <GCS_CONNECTOR_HADOOP_VERSION>
The latest Google Cloud BigQuery dependency can be downloaded with the command below:
$ /usr/share/flexter/sbin/download.sh -q
However, if a particular Google Cloud BigQuery version is required, it can be defined as below:
$ /usr/share/flexter/sbin/download.sh -Q <SPARK_BIGQUERY_VERSION>
Snowflake dependencies
The latest versions of Snowflake spark and jdbc connectors can be downloaded with the command below:
$ /usr/share/flexter/sbin/download.sh -s
However, if a particular version is required, a particular spark snowflake connector version can be informed:
$ /usr/share/flexter/sbin/download.sh -S <SPARK-SNOWFLAKE_VERSION>
* The right jdbc driver which comes with the spark connector will be downloaded as well.
Custom dependencies
The same spark-submit --packages
can be downloaded and installed permanently with the command below:
$ /usr/share/flexter/sbin/download.sh -p <GROUP:NAME:VERSION>