download.sh

This module is used to generate download spark and its dependencies. It comes with xml2er, json2er or merge2er modules.

This module is unnecessary if the environment already has spark installed, like:

  • AWS EMR
  • MapR distributions
  • HortonWorks distributions
  • Cloudera distributions
  • Custom installations
  • Docker-based containers and Kubernetes

Parameters

To get all the supported download.sh parameters, the following command can be used:

$ /usr/share/flexter/sbin/download.sh spark -h

or 

$ ~/.local/share/flexter/sbin/download.sh spark -h

All the compatible parameters that can be used are listed:

Spark downloader. It downloads the spark packages and optionally associated hadoop and hive packages.

It downloads spark and combines it with hadoop based on the chosen options

Usage: /usr/share/flexter/sbin/download.sh spark [OPTIONS]

  OPTIONS

  -h                      print this message

  -f                      Force installation

  -v                      Spark version
                          default: 3.1.2

  -b                      Spark binary package option.
                          Ex: hadoop3.2, 3.2, hadoop3, 3 or without-hadoop
                          It also accepts to choose an specific hadoop version, which is downloaded
                          automatically with spark's without-hadoop package.
                          Ex: hadoop2.8.5
                          default: hadoop3.2

  -H                      Install hive dependencies. When the spark package is without-hadoop, the
                          hive is missing. It uses maven and bring all missing jars.
                          default: 3.2.0

  -a                      Install avro dependencies. When the spark isn't shipped with spark-avro.
                          It uses maven and bring all missing jars



Examples

# Installing the spark 3.1.2 hadoop3.2 if it isn't detected
./sbin/download.sh spark

# Forcing installation
./sbin/download.sh spark -f

# Installing a another version
./sbin/download.sh spark -v 3.0.3

# Installing a hadoop version
./sbin/download.sh spark -b hadoop3.2.1

# Installing a hadoop version + hive dependencies
./sbin/download.sh spark -b hadoop3.2.1 -H

# Installing without hadoop, with hive dependencies and avro dependencies
./sbin/download.sh spark -b without-hadoop -Ha

Examples

Ways to enforce an installation

Installing the default spark version, it there is no other version already installed. If there is an older version, or an external installation, it won’t do anything.

$ /usr/share/flexter/sbin/download.sh spark

Forcing installing the default spark version, even if there is another version installed. If the version is already installed, it will switch the /usr/share/flexter/spark/default link to this version.

$ /usr/share/flexter/sbin/download.sh spark -f

Choosing a particular spark version

In some cases, the spark is released with some patches, or the server environment demands a particular spark version for other reasons.

In this case you can do as below:

$ /usr/share/flexter/sbin/download.sh spark -v 3.1.2

Choosing a particular spark binary packages

Spark packages in each version come in different builds. Like the version 3.1.x has

  • hadoop3.2: Embedded hadoop 3.2.x dependencies, the default.
  • hadoop2.7: Embedded hadoop 2.7.x dependencies.
  • without-hadoop: The spark will come without hadoop and hive at all. And it will download the default hadoop version and hive dependencies.
  • hadoop<MAJOR>.<MINOR>.<PATCH>: The spark will come without hadoop and hive at all. And it will download a particular hadoop version and hive dependencies.
  • <MAJOR>.<MINOR>.<PATCH>: Same as above.

In this case you can do as below:

$ /usr/share/flexter/sbin/download.sh spark -b hadoop3.2

If you choose the option without-hadoop, it will trigger a separated hadoop package downloading and hive dependencies.

Below you can see the default approach:

$ /usr/share/flexter/sbin/download.sh spark -b without-hadoop

You can also define a particular hadoop version to download together with spark (without-hadoop package) and hive dependencies

$ /usr/share/flexter/sbin/download.sh spark -b hadoop3.2.0 

or

$ /usr/share/flexter/sbin/download.sh spark -b 3.2.0

Downloading dependencies

Some dependencies already come with hadoop, like aws and azure ones. But for spark with hadoop embedded packages, these aren’t available.

In the case like the without-hadoop spark package, there are no hive dependencies.

The download.sh is also capable to install other external dependencies that don’t come with spark and hadoop.

hive dependencies

It will be downloaded automatically if you had chosen to install the spark packages without hadoop, however, you can download it again using:

$ /usr/share/flexter/sbin/download.sh spark -H 

avro dependencies

Some versions of Apache Spark don’t come with Avro libraries pre-installed.

$ /usr/share/flexter/sbin/download.sh spark -a