Hadoop 2.2 Install Tutorial (0.23.x)

The below steps will work for Hadoop 2.2 also

The recent version of Hadoop 2.0 has different directory structure as compared to old version.

This post explains a simple method to install hadoop 2.0 in your computer. ( Hadoop 0.23 Installation )

There are multiple ways to do this , and one of them is presented below.
If you want to install old version of hadoop then please see other post.

Purpose of post is to explain how to install hadoop in your computer. This post considers that you have Linux based system available for use. I am doing this on Ubuntu system

Before you begin create a separate user named hadoop in the system and do all these operations in that.

This document covers the Steps to
1) Configure SSH
2) Install JDK
3) Install Hadoop

Update your repository
#sudo apt-get update

You can directly copy the commands from there and run in your system
Hadoop requires that various systems present in cluster can talk to each other freely. Hadoop use SSH to prove the identity for connection.

Let's Download and configure SSH

#sudo apt-get install openssh-server openssh-client
#ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
#cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
#sudo chmod go-w $HOME $HOME/.ssh
#sudo chmod 600 $HOME/.ssh/authorized_keys
#sudo chown `whoami` $HOME/.ssh/authorized_keys

Testing your SSH

#ssh localhost
Say yes

It should open connection with SSH

This will close the SSH

Java 1.6 is mandatory for running hadoop

Lets Download and install JDK

#sudo mkdir /usr/java
#cd /usr/java
#sudo wget http://download.oracle.com/otn-pub/java/jdk/6u31-b04/jdk-6u31-linux-i586.bin
Wait till the jdk download completes

Install java
#sudo chmod o+w jdk-6u31-linux-i586.bin
#sudo chmod +x jdk-6u31-linux-i586.bin
#sudo ./jdk-6u31-linux-i586.bin

Now comes the Hadoop :)

Download the latest tar in your computer for Hadoop 2.0.x and unzip it to some directory lets say HADOOP_PREFIX

Export the following environment variables in your computer
export HADOOP_PREFIX="/home/hadoop/software/hadoop-2.0.0-alpha"

Restart your computer once so that env / path variables come into action

In Hadoop 2.x version /etc/hadoop is the default conf directory

We need to modify / create following property files in the /etc/hadoop directory

Edit core-site.xml with following contents

    <description>The name of the default file system.  Either the
      literal string "local" or a host:port for NDFS.

Edit hdfs-site.xml with following contents

    <description>Determines where on the local filesystem the DFS name node
      should store the name table.  If this is a comma-delimited list
      of directories then the name table is replicated in all of the
      directories, for redundancy. </description>
    <description>Determines where on the local filesystem an DFS data node
       should store its blocks.  If this is a comma-delimited
       list of directories, then data will be stored in all named
       directories, typically on different devices.
       Directories that do not exist are ignored.

The path
file:/home/hadoop/workspace/hadoop_space/hadoop23/dfs/name AND
are some folders in your computer which would give space to store data and name edit files
Path should be specified as URI

Create a file mapred-site.xml inside /etc/hadoop with following contents


The path
file:/home/hadoop/workspace/hadoop_space/hadoop23/mapred/system  AND
are some folders in your computer which would give space to store data
Path should be specified as URI

Edit yarn-site.xml with following contents
    <description>shuffle service that needs to be set for Map Reduce to run </description>

Inside /etc/hadoop directory
Create one file hadoop-env.sh and add following to it

export JAVA_HOME=/usr/java/jdk1.6.0_31
Change the path above for your JAVA_HOME as per location where it is inside your PC

Save it and now we are ready to format
Format the namenode
# hdfs namenode –format
Say Yes and let it complete the format

Time to start the daemons
# hadoop-daemon.sh start namenode
# hadoop-daemon.sh start datanode
You can also start both of them together by
# start-dfs.sh
Start Yarn Daemons
# yarn-daemon.sh start resourcemanager
# yarn-daemon.sh start nodemanager
You can also start all yarn daemons together by
# start-yarn.sh
Time to check if Daemons have started
Enter the command
# jps

2539 NameNode
2744 NodeManager
3075 Jps
3030 DataNode
2691 ResourceManager
Time to launch UI
Open the localhost:8088 to see the Resource Manager page
Done :)
Happy Hadooping :)


  1. Hadoop 2.0 has HA NameNode. Did you explored that feature?

  2. hi thanks for the tutorial i followed all your suggestion but when i do

    hduser@impala:/usr/local/hadoop$ hdfs -format

    i get this error any reason why ..

    Unrecognized option: -format
    Error: Could not create the Java Virtual Machine.
    Error: A fatal exception has occurred. Program will exit.

    1. got it working typo on my part when user "hdfs namenode -format" it worked.

    2. Am glad it worked :) Happy Hadooping

  3. Thanks for your valuable inputs...helped me a lot... thank u thank u...

  4. What a great pice of work! congratulations and thanks for your contribution.


Please share your views and comments below.

Thank You.