download.sh
This module is unnecessary if the environment already has spark installed, like:
- AWS EMR
- MapR distributions
- HortonWorks distributions
- Cloudera distributions
- Custom installations
- Docker-based containers and Kubernetes
Parameters
To get all the supported download.sh parameters, the following command can be used:
$ /usr/share/flexter/sbin/download.sh spark -h
or
$ ~/.local/share/flexter/sbin/download.sh spark -h
All the compatible parameters that can be used are listed:
Spark downloader. It downloads the spark packages and optionally associated hadoop and hive packages.
It downloads spark and combines it with hadoop based on the chosen options
Usage: /usr/share/flexter/sbin/download.sh spark [OPTIONS]
OPTIONS
-h print this message
-f Force installation
-v Spark version
default: 3.1.2
-b Spark binary package option.
Ex: hadoop3.2, 3.2, hadoop3, 3 or without-hadoop
It also accepts to choose an specific hadoop version, which is downloaded
automatically with spark's without-hadoop package.
Ex: hadoop2.8.5
default: hadoop3.2
-H Install hive dependencies. When the spark package is without-hadoop, the
hive is missing. It uses maven and bring all missing jars.
default: 3.2.0
-a Install avro dependencies. When the spark isn't shipped with spark-avro.
It uses maven and bring all missing jars
Examples
# Installing the spark 3.1.2 hadoop3.2 if it isn't detected
./sbin/download.sh spark
# Forcing installation
./sbin/download.sh spark -f
# Installing a another version
./sbin/download.sh spark -v 3.0.3
# Installing a hadoop version
./sbin/download.sh spark -b hadoop3.2.1
# Installing a hadoop version + hive dependencies
./sbin/download.sh spark -b hadoop3.2.1 -H
# Installing without hadoop, with hive dependencies and avro dependencies
./sbin/download.sh spark -b without-hadoop -Ha
Examples
Ways to enforce an installation
Installing the default spark version, it there is no other version already installed. If there is an older version, or an external installation, it won’t do anything.
$ /usr/share/flexter/sbin/download.sh spark
Forcing installing the default spark version, even if there is another version installed.
If the version is already installed, it will switch the /usr/share/flexter/spark/default
link to this version.
$ /usr/share/flexter/sbin/download.sh spark -f
Choosing a particular spark version
In some cases, the spark is released with some patches, or the server environment demands a particular spark version for other reasons.
In this case you can do as below:
$ /usr/share/flexter/sbin/download.sh spark -v 3.1.2
Choosing a particular spark binary packages
Spark packages in each version come in different builds. Like the version 3.1.x has
- hadoop3.2: Embedded hadoop 3.2.x dependencies, the default.
- hadoop2.7: Embedded hadoop 2.7.x dependencies.
- without-hadoop: The spark will come without hadoop and hive at all. And it will download the default hadoop version and hive dependencies.
- hadoop<MAJOR>.<MINOR>.<PATCH>: The spark will come without hadoop and hive at all. And it will download a particular hadoop version and hive dependencies.
- <MAJOR>.<MINOR>.<PATCH>: Same as above.
In this case you can do as below:
$ /usr/share/flexter/sbin/download.sh spark -b hadoop3.2
If you choose the option without-hadoop
, it will trigger a separated hadoop package downloading and hive dependencies.
Below you can see the default approach:
$ /usr/share/flexter/sbin/download.sh spark -b without-hadoop
You can also define a particular hadoop version to download together with spark (without-hadoop package) and hive dependencies
$ /usr/share/flexter/sbin/download.sh spark -b hadoop3.2.0
or
$ /usr/share/flexter/sbin/download.sh spark -b 3.2.0
Downloading dependencies
Some dependencies already come with hadoop, like aws and azure ones. But for spark with hadoop embedded packages, these aren’t available.
In the case like the without-hadoop spark package, there are no hive dependencies.
The download.sh
is also capable to install other external dependencies that don’t come with spark and hadoop.
hive dependencies
It will be downloaded automatically if you had chosen to install the spark packages without hadoop, however, you can download it again using:
$ /usr/share/flexter/sbin/download.sh spark -H
avro dependencies
Some versions of Apache Spark don’t come with Avro libraries pre-installed.
$ /usr/share/flexter/sbin/download.sh spark -a