Hadoop 3 install tutorial single node

The purpose of the post is to explain how to install Hadoop 3 on your computer. This post considers that you have Linux based system available for use. I am doing this on Ubuntu system

Before you begin to create a separate user named Hadoop in the system and do all these operations in that.

This document covers the Steps to
1) Configure SSH
2) Install JDK
3) Install Hadoop

Update your repository

sudo apt-get update

You can directly copy the commands from there and run in your system
Hadoop requires that various systems present in the cluster can talk to each other freely. Hadoop use SSH to prove the identity for connection.

Let's Download and configure SSH

sudo apt-get install openssh-server openssh-client
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
sudo chmod go-w $HOME $HOME/.ssh
sudo chmod 600 $HOME/.ssh/authorized_keys
sudo chown `whoami` $HOME/.ssh/authorized_keys

Testing your SSH

ssh localhost

Say yes

It should open connection with SSH

exit

This will close the SSH

Java 1.8 is mandatory for running Hadoop

Lets Download and install JDK

sudo apt-get update
sudo apt-get install openjdk-8-jdk

Now comes the Hadoop :)

Download the latest tar in your computer for Hadoop 3.2.1 and unzip it to some directory lets say HADOOP_HOME

Export the following environment variables in your computer, change paths according to your environment.

export HADOOP_HOME="/home/jj/dev/softwares/hadoop-3.2.1"
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

We need to modify/create following property files in the etc/hadoop directory

Edit core-site.xml with following contents

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost</value>
</property>
</configuration>


Edit hdfs-site.xml with following contents

<configuration>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/hdfs/dfs/name</value>
    <description>Determines where on the local filesystem the DFS name node
      should store the name table.  If this is a comma-delimited list
      of directories then the name table is replicated in all of the
      directories, for redundancy. </description>
    <final>true</final>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/hdfs/dfs/data</value>
    <description>Determines where on the local filesystem an DFS data node
       should store its blocks.  If this is a comma-delimited
       list of directories, then data will be stored in all named
       directories, typically on different devices.
       Directories that do not exist are ignored.
    </description>
    <final>true</final>
  </property>
<property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>



The path
file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/hdfs/dfs/name AND
file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/hdfs/dfs/data
are some folders in your computer which would give space to store data and name edit files
Path should be specified as URI

Edit  mapred-site.xml inside /etc/hadoop with following contents

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
    <name>mapred.system.dir</name>
    <value>file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/mapred/system</value>
    <final>true</final>
  </property>
  <property>
    <name>mapred.local.dir</name>
    <value>file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/mapred/local</value>
    <final>true</final>
  </property>
      <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>


The path
file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/mapred/system  AND
file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/mapred/local
are some folders in your computer which would give space to store data
Path should be specified as URI

Edit yarn-site.xml with following contents

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>


Inside conf directory
Create one file hadoop-env.sh and add following to it

export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"

Change the path above for your JAVA_HOME as per location where it is inside your PC

Save it and now we are ready to format

Format the namenode
# hdfs namenode –format
Say Yes and let it complete the format

Time to start the daemons
# ./sbin/hadoop-daemon.sh start namenode
# ./sbin/hadoop-daemon.sh start datanode

Start Yarn Daemons
# ./sbin/yarn-daemon.sh start resourcemanager
# ./sbin/yarn-daemon.sh start nodemanager

Time to check if Daemons have started
Enter the command
# jps

2539 NameNode
2744 NodeManager
3075 Jps
3030 DataNode
2691 ResourceManager
Time to launch UI

Open the localhost:8088 to see the Resource Manager page
Open the localhost:9870 to see the Namenode page

Done :)
Happy Hadooping :)

No comments:

Post a Comment

Please share your views and comments below.

Thank You.