Spark Standalone mode installation steps

Based on

http://spark-project.org/docs/latest/spark-standalone.html

Download Spark from

http://spark-project.org/downloads/

I used the prebuilt version of Spark for this post. For building from source please see instructions on website

Download

http://spark-project.org/download/spark-0.7.3-prebuilt-hadoop1.tgz

Extract it to some location say

/home/jagat/Downloads/spark-0.7.3

For running this you need both Scala and Java

I have downloaded them and configured following things

In /etc/environment file

Add

SCALA_HOME="/home/jagat/Downloads/scala-2.9.3"
PATH=$PATH:$SCALA_HOME/bin

JAVA_HOME="/home/jagat/Downloads/jdk1.7.0_25"
PATH=$PATH:$JAVA_HOME/bin

You might change the above paths depending on your system


Go to

/home/jagat/Downloads/spark-0.7.3/conf

Rename the spark-env.sh.template file to spart-env.sh

Add the following values


export SCALA_HOME="/home/jagat/Downloads/scala-2.9.3"
export PATH=$PATH:$SCALA_HOME/bin

export JAVA_HOME="/home/jagat/Downloads/jdk1.7.0_25"
export PATH=$PATH:$JAVA_HOME/bin


Now check your hosts file that your system DNS is resolving correct. Specially if you are on Ubuntu like me

Go to /etc/hosts

Change the following

jagat@Dell9400:~$ cat /etc/hosts
127.0.0.1    localhost
192.168.0.104    Dell9400

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters


for Ubuntu the loopback address for your host is 127.0.1.1 , change it to your exact IP


Now we are ready to go

Go to

/home/jagat/Downloads/spark-0.7.3/

Start Spark Master


./run spark.deploy.master.Master

Check the URL where master started

http://localhost:8080/

It will give info that master is started with

URL: spark://Dell9400:7077

This is the URL which we need in all our applications

Lets start one Worker by telling it about master

./run spark.deploy.worker.Worker spark://Dell9400:7077

This register the worker with the master.

Now refresh the master page

http://localhost:8080/

You can see that a worker is added on the page

Now

Connecting a Job to the Cluster

To run a job on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor.

To run an interactive Spark shell against the cluster, run the following command:

MASTER=spark://IP:PORT ./spark-shell


Thats it

I admit that was very raw steps , but i kept it simple and quick for first time users

Happy Sparking :)

No comments:

Post a Comment

Please share your views and comments below.

Thank You.