Based on
http://spark-project.org/docs/latest/spark-standalone.html
Download Spark from
http://spark-project.org/downloads/
I used the prebuilt version of Spark for this post. For building from source please see instructions on website
Download
http://spark-project.org/download/spark-0.7.3-prebuilt-hadoop1.tgz
Extract it to some location say
/home/jagat/Downloads/spark-0.7.3
For running this you need both Scala and Java
I have downloaded them and configured following things
In /etc/environment file
Add
SCALA_HOME="/home/jagat/Downloads/scala-2.9.3"
PATH=$PATH:$SCALA_HOME/bin
JAVA_HOME="/home/jagat/Downloads/jdk1.7.0_25"
PATH=$PATH:$JAVA_HOME/bin
You might change the above paths depending on your system
Go to
/home/jagat/Downloads/spark-0.7.3/conf
Rename the spark-env.sh.template file to spart-env.sh
Add the following values
export SCALA_HOME="/home/jagat/Downloads/scala-2.9.3"
export PATH=$PATH:$SCALA_HOME/bin
export JAVA_HOME="/home/jagat/Downloads/jdk1.7.0_25"
export PATH=$PATH:$JAVA_HOME/bin
Now check your hosts file that your system DNS is resolving correct. Specially if you are on Ubuntu like me
Go to /etc/hosts
Change the following
jagat@Dell9400:~$ cat /etc/hosts
127.0.0.1 localhost
192.168.0.104 Dell9400
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
for Ubuntu the loopback address for your host is 127.0.1.1 , change it to your exact IP
Now we are ready to go
Go to
/home/jagat/Downloads/spark-0.7.3/
Start Spark Master
./run spark.deploy.master.Master
Check the URL where master started
http://localhost:8080/
It will give info that master is started with
URL: spark://Dell9400:7077
This is the URL which we need in all our applications
Lets start one Worker by telling it about master
./run spark.deploy.worker.Worker spark://Dell9400:7077
This register the worker with the master.
Now refresh the master page
http://localhost:8080/
You can see that a worker is added on the page
Now
Connecting a Job to the Cluster
To run a job on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor.
To run an interactive Spark shell against the cluster, run the following command:
MASTER=spark://IP:PORT ./spark-shell
Thats it
I admit that was very raw steps , but i kept it simple and quick for first time users
Happy Sparking :)
http://spark-project.org/docs/latest/spark-standalone.html
Download Spark from
http://spark-project.org/downloads/
I used the prebuilt version of Spark for this post. For building from source please see instructions on website
Download
http://spark-project.org/download/spark-0.7.3-prebuilt-hadoop1.tgz
Extract it to some location say
/home/jagat/Downloads/spark-0.7.3
For running this you need both Scala and Java
I have downloaded them and configured following things
In /etc/environment file
Add
SCALA_HOME="/home/jagat/Downloads/scala-2.9.3"
PATH=$PATH:$SCALA_HOME/bin
JAVA_HOME="/home/jagat/Downloads/jdk1.7.0_25"
PATH=$PATH:$JAVA_HOME/bin
You might change the above paths depending on your system
Go to
/home/jagat/Downloads/spark-0.7.3/conf
Rename the spark-env.sh.template file to spart-env.sh
Add the following values
export SCALA_HOME="/home/jagat/Downloads/scala-2.9.3"
export PATH=$PATH:$SCALA_HOME/bin
export JAVA_HOME="/home/jagat/Downloads/jdk1.7.0_25"
export PATH=$PATH:$JAVA_HOME/bin
Now check your hosts file that your system DNS is resolving correct. Specially if you are on Ubuntu like me
Go to /etc/hosts
Change the following
jagat@Dell9400:~$ cat /etc/hosts
127.0.0.1 localhost
192.168.0.104 Dell9400
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
for Ubuntu the loopback address for your host is 127.0.1.1 , change it to your exact IP
Now we are ready to go
Go to
/home/jagat/Downloads/spark-0.7.3/
Start Spark Master
./run spark.deploy.master.Master
Check the URL where master started
http://localhost:8080/
It will give info that master is started with
URL: spark://Dell9400:7077
This is the URL which we need in all our applications
Lets start one Worker by telling it about master
./run spark.deploy.worker.Worker spark://Dell9400:7077
This register the worker with the master.
Now refresh the master page
http://localhost:8080/
You can see that a worker is added on the page
Now
Connecting a Job to the Cluster
To run a job on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor.
To run an interactive Spark shell against the cluster, run the following command:
MASTER=spark://IP:PORT ./spark-shell
Thats it
I admit that was very raw steps , but i kept it simple and quick for first time users
Happy Sparking :)
No comments:
Post a Comment
Please share your views and comments below.
Thank You.