The purpose of the post is to explain how to install Hadoop 3 on your computer. This post considers that you have Linux based system available for use. I am doing this on Ubuntu system
Before you begin to create a separate user named Hadoop in the system and do all these operations in that.
This document covers the Steps to
1) Configure SSH
2) Install JDK
3) Install Hadoop
Update your repository
sudo apt-get update
You can directly copy the commands from there and run in your system
Hadoop requires that various systems present in the cluster can talk to each other freely. Hadoop use SSH to prove the identity for connection.
Let's Download and configure SSH
sudo apt-get install openssh-server openssh-client
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
sudo chmod go-w $HOME $HOME/.ssh
sudo chmod 600 $HOME/.ssh/authorized_keys
sudo chown `whoami` $HOME/.ssh/authorized_keys
Testing your SSH
ssh localhost
Say yes
It should open connection with SSH
exit
This will close the SSH
Java 1.8 is mandatory for running Hadoop
Lets Download and install JDK
sudo apt-get update
sudo apt-get install openjdk-8-jdk
Now comes the Hadoop :)
Download the latest tar in your computer for Hadoop 3.2.1 and unzip it to some directory lets say HADOOP_HOME
Export the following environment variables in your computer, change paths according to your environment.
export HADOOP_HOME="/home/jj/dev/softwares/hadoop-3.2.1"
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
We need to modify/create following property files in the etc/hadoop directory
Edit core-site.xml with following contents
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost</value>
</property>
</configuration>
Edit hdfs-site.xml with following contents
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/hdfs/dfs/name</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table. If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. </description>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/hdfs/dfs/data</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
The path
file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/hdfs/dfs/name AND
file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/hdfs/dfs/data
are some folders in your computer which would give space to store data and name edit files
Path should be specified as URI
Edit mapred-site.xml inside /etc/hadoop with following contents
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/mapred/system</value>
<final>true</final>
</property>
<property>
<name>mapred.local.dir</name>
<value>file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/mapred/local</value>
<final>true</final>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
The path
file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/mapred/system AND
file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/mapred/local
are some folders in your computer which would give space to store data
Path should be specified as URI
Edit yarn-site.xml with following contents
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
Inside conf directory
Create one file hadoop-env.sh and add following to it
export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"
Change the path above for your JAVA_HOME as per location where it is inside your PC
Save it and now we are ready to format
Format the namenode
# hdfs namenode –format
Say Yes and let it complete the format
Time to start the daemons
# ./sbin/hadoop-daemon.sh start namenode
# ./sbin/hadoop-daemon.sh start datanode
Start Yarn Daemons
# ./sbin/yarn-daemon.sh start resourcemanager
# ./sbin/yarn-daemon.sh start nodemanager
Time to check if Daemons have started
Enter the command
# jps
2539 NameNode
2744 NodeManager
3075 Jps
3030 DataNode
2691 ResourceManager
Time to launch UI
Open the localhost:8088 to see the Resource Manager page
Open the localhost:9870 to see the Namenode page
Done :)
Happy Hadooping :)
Before you begin to create a separate user named Hadoop in the system and do all these operations in that.
This document covers the Steps to
1) Configure SSH
2) Install JDK
3) Install Hadoop
Update your repository
sudo apt-get update
You can directly copy the commands from there and run in your system
Hadoop requires that various systems present in the cluster can talk to each other freely. Hadoop use SSH to prove the identity for connection.
Let's Download and configure SSH
sudo apt-get install openssh-server openssh-client
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
sudo chmod go-w $HOME $HOME/.ssh
sudo chmod 600 $HOME/.ssh/authorized_keys
sudo chown `whoami` $HOME/.ssh/authorized_keys
Testing your SSH
ssh localhost
Say yes
It should open connection with SSH
exit
This will close the SSH
Java 1.8 is mandatory for running Hadoop
Lets Download and install JDK
sudo apt-get update
sudo apt-get install openjdk-8-jdk
Now comes the Hadoop :)
Download the latest tar in your computer for Hadoop 3.2.1 and unzip it to some directory lets say HADOOP_HOME
Export the following environment variables in your computer, change paths according to your environment.
export HADOOP_HOME="/home/jj/dev/softwares/hadoop-3.2.1"
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
We need to modify/create following property files in the etc/hadoop directory
Edit core-site.xml with following contents
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost</value>
</property>
</configuration>
Edit hdfs-site.xml with following contents
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/hdfs/dfs/name</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table. If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. </description>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/hdfs/dfs/data</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
The path
file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/hdfs/dfs/name AND
file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/hdfs/dfs/data
are some folders in your computer which would give space to store data and name edit files
Path should be specified as URI
Edit mapred-site.xml inside /etc/hadoop with following contents
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/mapred/system</value>
<final>true</final>
</property>
<property>
<name>mapred.local.dir</name>
<value>file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/mapred/local</value>
<final>true</final>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
The path
file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/mapred/system AND
file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/mapred/local
are some folders in your computer which would give space to store data
Path should be specified as URI
Edit yarn-site.xml with following contents
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
Inside conf directory
Create one file hadoop-env.sh and add following to it
export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"
Change the path above for your JAVA_HOME as per location where it is inside your PC
Save it and now we are ready to format
Format the namenode
# hdfs namenode –format
Say Yes and let it complete the format
Time to start the daemons
# ./sbin/hadoop-daemon.sh start namenode
# ./sbin/hadoop-daemon.sh start datanode
Start Yarn Daemons
# ./sbin/yarn-daemon.sh start resourcemanager
# ./sbin/yarn-daemon.sh start nodemanager
Time to check if Daemons have started
Enter the command
# jps
2539 NameNode
2744 NodeManager
3075 Jps
3030 DataNode
2691 ResourceManager
Time to launch UI
Open the localhost:8088 to see the Resource Manager page
Open the localhost:9870 to see the Namenode page
Done :)
Happy Hadooping :)