Spark Installation on Linux Ubuntu

Let’s learn how to do Apache Spark Installation on Linux based Ubuntu server, same steps can be used to setup Centos, Debian e.t.c. In real-time all Spark application runs on Linux based OS hence it is good to have knowledge on how to Install and run Spark applications on some Unix based OS like Ubuntu server.

Though this article explains with Ubuntu, you can follow these steps to install Spark on any Linux-based OS like Centos, Debian e.t.c, I followed the below steps to setup my Apache Spark cluster on Ubuntu server.

Prerequisites:

If you just wanted to run Spark in standalone, proceed with this article.

Java Installation On Ubuntu

Apache Spark is written in Scala which is a language of Java hence to run Spark you need to have Java Installed. Since Oracle Java is licensed here I am using openJDK Java. If you wanted to use Java from other vendors or Oracle please do so. Here I will be using JDK 8.


sudo apt-get -y install openjdk-8-jdk-headless

Post JDK install, check if it installed successfully by running java -version

java spark installation ubuntu

Python Installation On Ubuntu

You can skip this section if you wanted to run Spark with Scala & Java on an Ubuntu server.

Python Installation is needed if you wanted to run PySpark examples (Spark with Python) on the Ubuntu server.


sudo apt install python3

Apache Spark Installation on Ubuntu

In order to install Apache Spark on Linux based Ubuntu, access Apache Spark Download site and go to the Download Apache Spark section and click on the link from point 3, this takes you to the page with mirror URL’s to download. copy the link from one of the mirror site.

Spark installation ubuntu

If you wanted to use a different version of Spark & Hadoop, select the one you wanted from the drop-down (point 1 and 2); the link on point 3 changes to the selected version and provides you with an updated link to download.

Use wget command to download the Apache Spark to your Ubuntu server.


wget https://downloads.apache.org/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz
Spark Installation Linux

Once your download is complete, untar the archive file contents using tar command, tar is a file archiving tool. Once untar complete, rename the folder to spark.


tar -xzf spark-3.0.1-bin-hadoop2.7.tgz
mv spark-3.0.1-bin-hadoop2.7 spark
TODO - Add Python environment

Spark Environment Variables

Add Apache Spark environment variables to .bashrc or .profile file. open file in vi editor and add below variables.


[email protected]:~$ vi ~/.bashrc
# Add below lines at the end of the .bashrc file.
export SPARK_HOME=/home/sparkuser/spark
export PATH=$PATH:$SPARK_HOME/bin

Now load the environment variables to the opened session by running below command


[email protected]:~$ source ~/.bashrc

In case if you added to .profile file then restart your session by closing and re-opening the session.

Test Spark Installation on Ubuntu

With this, Apache Spark Installation on Linux Ubuntu completes. Now let’s run a sample example that comes with Spark binary distribution.

Here I will be using Spark-Submit Command to calculate PI value for 10 places by running org.apache.spark.examples.SparkPi example. You can find spark-submit at $SPARK_HOME/bin directory.


spark-submit --class org.apache.spark.examples.SparkPi spark/examples/jars/spark-examples_2.12-3.0.1.jar 10
spark install ubuntu

Spark Shell

Apache Spark binary comes with an interactive spark-shell. In order to start a shell to use Scala language, go to your $SPARK_HOME/bin directory and type “spark-shell“. This command loads the Spark and displays what version of Spark you are using.

spark install linux
Apache Spark Shell

Note: In spark-shell you can run only Spark with Scala. In order to run PySpark, you need to open pyspark shell by running $SPARK_HOME/bin/pyspark . Make sure you have Python installed before running pyspark shell.

By default, spark-shell provides with spark (SparkSession) and sc (SparkContext) object’s to use. Let’s see some examples.

spark shell
spark-shell create RDD

Spark-shell also creates a Spark context web UI and by default, it can access from http://ip-address:4040.

Spark Web UI

Apache Spark provides a suite of Web UIs (Jobs, Stages, Tasks, Storage, Environment, Executors, and SQL) to monitor the status of your Spark application, resource consumption of Spark cluster, and Spark configurations. On Spark Web UI, you can see how the Spark Actions and Transformation operations are executed. You can access by opening http://ip-address:4040/. replace ip-address with your server IP.

Spark Web UI

Spark History server

Spark History server, keep a log of all completed Spark applications you submit by spark-submit, and spark-shell.

Create $SPARK_HOME/conf/spark-defaults.conf file and add below configurations.


# Enable to store the event log
spark.eventLog.enabled true

#Location where to store event log
spark.eventLog.dir file:///tmp/spark-events

#Location from where history server to read event log
spark.history.fs.logDirectory file:///tmp/spark-events

Create Spark Event Log directory. Spark keeps logs for all applications you submitted.


[email protected]:~$ mkdir /tmp/spark-events

Run $SPARK_HOME/sbin/start-history-server.sh to start history server.


[email protected]:~$ $SPARK_HOME/sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /home/sparkuser/spark/logs/spark-sparkuser-org.apache.spark.deploy.history.HistoryServer-1-sparknode.out

As per the configuration, history server by default runs on 18080 port.

Spark Installation Linux

Run PI example again by using spark-submit command, and refresh the History server which should show the recent run.

Conclusion

In Summary, you have learned steps involved in Apache Spark Installation on Linux based Ubuntu Server, and also learned how to start History Server, access web UI.

You Should Also Read:

Author: Shantun Parmar

3 thoughts on “Spark Installation on Linux Ubuntu

  1. I’m also commenting to let you be aware of of the cool discovery my child encountered checking your blog. She came to understand a lot of details, including how it is like to have a marvelous coaching heart to get folks easily fully grasp a variety of complicated issues. You actually did more than my desires. I appreciate you for delivering such good, dependable, edifying and also cool guidance on that topic to Ethel.

  2. My spouse and i got quite joyful that John managed to do his preliminary research through the entire precious recommendations he grabbed out of your web pages. It is now and again perplexing to just choose to be giving freely information and facts that many other folks have been selling. And we all figure out we have the website owner to be grateful to for that. The entire explanations you have made, the simple site menu, the friendships you can give support to engender – it’s got everything fabulous, and it’s really facilitating our son and the family feel that the issue is enjoyable, and that is incredibly important. Thank you for all the pieces!

  3. I am also commenting to let you know of the helpful encounter my friend’s daughter found reading through your web page. She figured out numerous issues, not to mention what it’s like to have an amazing giving mood to have most people really easily know selected impossible subject areas. You really exceeded our own expectations. Many thanks for imparting the interesting, safe, explanatory and even easy thoughts on the topic to Janet.

Thanks for your support, You may click on ads to encourage us which assits to writers.

Leave a Reply

Your email address will not be published.