Purpose of post is to explain how to install hadoop in your computer. This post considers that you have Linux based system available for use. I am doing this on Ubuntu system
If you want to know how to install latest version of Hadoop 2.0 , then see the Hadoop 2.0 Install Tutorial
Before you begin create a separate user named hadoop in the system and do all these operations in that.
This document covers the Steps to
1) Configure SSH
2) Install JDK
3) Install Hadoop
Update your repository
#sudo apt-get update
You can directly copy the commands from there and run in your system
Hadoop requires that various systems present in cluster can talk to each other freely. Hadoop use SSH to prove the identity for connection.
Let's Download and configure SSH
#sudo apt-get install openssh-server openssh-client
#ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
#cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
#sudo chmod go-w $HOME $HOME/.ssh
#sudo chmod 600 $HOME/.ssh/authorized_keys
#sudo chown `whoami` $HOME/.ssh/authorized_keys
Testing your SSH
#ssh localhost
Say yes
It should open connection with SSH
#exit
This will close the SSH
Java 1.6 is mandatory for running hadoop
Lets Download and install JDK
#sudo mkdir /usr/java
#cd /usr/java
#sudo wget http://download.oracle.com/otn-pub/java/jdk/6u31-b04/jdk-6u31-linux-i586.bin
Wait till the jdk download completes
Install java
#sudo chmod o+w jdk-6u31-linux-i586.bin
#sudo chmod +x jdk-6u31-linux-i586.bin
#sudo ./jdk-6u31-linux-i586.bin
Now comes the Hadoop :)
Lets Download and configure Hadoop in Pseudo distributed mode. You can read more about various types of modes on Hadoop website.
Download the latest hadoop version from its website
http://hadoop.apache.org/common/releases.html
Download hadoop 1.0.x tar.gz from hadoop website
Extract it into some folder ( say /home/hadoop/software/20/ )
All softwares have been downloaded at that location
For other modes (Standalone and Fully distributed) please see hadoop documentation
Go to conf directory in hadoop folder and open core-site.xml and add the following property in blank configuration tags
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost</value>
</property>
</configuration>
Similarly do for
conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
Environment variables
In hadoop_env.sh file , change the JAVA_HOME to location where you installed java
e.g
JAVA_HOME = /usr/java/jdk1.6.0_31
Configure the environment variables for JDK , Hadoop as follows
Go to ~.profile file in the current user home directory
Add the following
You can change the variable paths if you have installed hadoop and java at some other locations
export JAVA_HOME="/usr/java/jdk1.6.0_31"
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_INSTALL="/home/hadoop/software/hadoop-1.0.1"
export PATH=$PATH:$HADOOP_INSTALL/bin
Testing your installation
Format the HDFS
# hadoop namenode -format
hadoop@jj-VirtualBox:~$ start-dfs.sh
starting namenode, logging to /home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-namenode-jj-VirtualBox.out
localhost: starting datanode, logging to /home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-datanode-jj-VirtualBox.out
localhost: starting secondarynamenode, logging to /home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-secondarynamenode-jj-VirtualBox.out
hadoop@jj-VirtualBox:~$ start-mapred.sh
starting jobtracker, logging to /home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-jobtracker-jj-VirtualBox.out
localhost: starting tasktracker, logging to /home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-tasktracker-jj-VirtualBox.out
Open the browser and point to page
localhost:50030
localhost:50070
It would open the status page for hadoop
Thats it , this completes the installation of Hadoop , now you are ready to play with it.
This guide made me realize i left out part of a config file, thanks.
ReplyDeleteIs there a guide to just using hdfs as a distributed file system, like a replacement for NFS, AFS, or gluster?
Hello Genewitch
ReplyDeleteThank you for your comment.
There is very interesting discussion on HDFS standalone.
Just go through the mailing list
http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201102.mbox/%3CAANLkTi=+Wic=e4uj3vpHrihctr7Uu84uh8YbS1fuXccw@mail.gmail.com%3E
Hi,
ReplyDeleteI have two node cluster, rsi1 and rsi2 are hostnames of both machines.
what should be value of fs.default.name and mapred.job.tracker on both federated namenodes.
want to make both nodes as federated nodes.
Appreciate your reply.
Rashmi
How can we install Hive?Can you please guide.
ReplyDeletePlease
DeleteDownload Hive tar ball from Apache website
Extract it to some place
In the conf file (hive-default.xml)specify the following parameter
You can create ( or find) this xml in
conf folder
mapred.job.tracker
JobTrakerIP:8021
Also read one of the post on my blog to configure Hive MySQL metastore instead of derby default DB
http://jugnu-life.blogspot.com.au/2012/05/hive-mysql-setup-configuration.html
Thanks
so here we will add mapred.job.tracker and JobTrakerIP:8021 as property inside hive-default.xml.so my another property will like mapred.job.trackerJobTrakerIP:8021...then i will restart hive and hadoop....please reply sir,its urgent for me now
Deletehive> show tables;
ReplyDeleteFAILED: Error in metadata: MetaException(message:Got exception: java.net.ConnectException Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
why this error is for?Sir can u help me on this.
how to know which is my datanode and namenode?
ReplyDeleteDo jps on nodes , you can see where namenode service is running.
Delete