Hadoop 3 install tutorial single node

The purpose of the post is to explain how to install Hadoop 3 on your computer. This post considers that you have Linux based system available for use. I am doing this on Ubuntu system

Before you begin to create a separate user named Hadoop in the system and do all these operations in that.

This document covers the Steps to
1) Configure SSH
2) Install JDK
3) Install Hadoop

Update your repository

sudo apt-get update

You can directly copy the commands from there and run in your system
Hadoop requires that various systems present in the cluster can talk to each other freely. Hadoop use SSH to prove the identity for connection.

Let's Download and configure SSH

sudo apt-get install openssh-server openssh-client
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
sudo chmod go-w $HOME $HOME/.ssh
sudo chmod 600 $HOME/.ssh/authorized_keys
sudo chown `whoami` $HOME/.ssh/authorized_keys

Testing your SSH

ssh localhost

Say yes

It should open connection with SSH

exit

This will close the SSH

Java 1.8 is mandatory for running Hadoop

Lets Download and install JDK

sudo apt-get update
sudo apt-get install openjdk-8-jdk

Now comes the Hadoop :)

Download the latest tar in your computer for Hadoop 3.2.1 and unzip it to some directory lets say HADOOP_HOME

Export the following environment variables in your computer, change paths according to your environment.

export HADOOP_HOME="/home/jj/dev/softwares/hadoop-3.2.1"
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

We need to modify/create following property files in the etc/hadoop directory

Edit core-site.xml with following contents

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost</value>
</property>
</configuration>


Edit hdfs-site.xml with following contents

<configuration>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/hdfs/dfs/name</value>
    <description>Determines where on the local filesystem the DFS name node
      should store the name table.  If this is a comma-delimited list
      of directories then the name table is replicated in all of the
      directories, for redundancy. </description>
    <final>true</final>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/hdfs/dfs/data</value>
    <description>Determines where on the local filesystem an DFS data node
       should store its blocks.  If this is a comma-delimited
       list of directories, then data will be stored in all named
       directories, typically on different devices.
       Directories that do not exist are ignored.
    </description>
    <final>true</final>
  </property>
<property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>



The path
file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/hdfs/dfs/name AND
file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/hdfs/dfs/data
are some folders in your computer which would give space to store data and name edit files
Path should be specified as URI

Edit  mapred-site.xml inside /etc/hadoop with following contents

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
    <name>mapred.system.dir</name>
    <value>file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/mapred/system</value>
    <final>true</final>
  </property>
  <property>
    <name>mapred.local.dir</name>
    <value>file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/mapred/local</value>
    <final>true</final>
  </property>
      <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>


The path
file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/mapred/system  AND
file:/home/jj/dev/softwares/hadoop-3.2.1/hadoop_space/mapred/local
are some folders in your computer which would give space to store data
Path should be specified as URI

Edit yarn-site.xml with following contents

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>


Inside conf directory
Create one file hadoop-env.sh and add following to it

export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"

Change the path above for your JAVA_HOME as per location where it is inside your PC

Save it and now we are ready to format

Format the namenode
# hdfs namenode –format
Say Yes and let it complete the format

Time to start the daemons
# ./sbin/hadoop-daemon.sh start namenode
# ./sbin/hadoop-daemon.sh start datanode

Start Yarn Daemons
# ./sbin/yarn-daemon.sh start resourcemanager
# ./sbin/yarn-daemon.sh start nodemanager

Time to check if Daemons have started
Enter the command
# jps

2539 NameNode
2744 NodeManager
3075 Jps
3030 DataNode
2691 ResourceManager
Time to launch UI

Open the localhost:8088 to see the Resource Manager page
Open the localhost:9870 to see the Namenode page

Done :)
Happy Hadooping :)

Maven package org.apache.commons.httpclient.methods does not exist

Apache commons http client has two versions

https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient

https://mvnrepository.com/artifact/commons-httpclient/commons-httpclient

From

https://stackoverflow.com/questions/10986661/apache-httpclient-does-not-exist-error

Solution:

Use older version of the client

<dependency>
<groupId>commons-httpclient</groupId>
<artifactId>commons-httpclient</artifactId>
<version>3.1</version>
</dependency>



Ubuntu 18.04 customizations

Ubuntu comes with lots of good options to configure the system.

Few of the things which I like are mentioned below.

Enable Gnome Shell extensions and Windows like themes

https://www.howtogeek.com/353819/how-to-make-ubuntu-look-more-like-windows/


sudo apt install gnome-shell-extensions gnome-shell-extension-dash-to-panel 
auso apt install gnome-tweaks adwaita-icon-theme-full
 
Install few good extensions

https://itsfoss.com/things-to-do-after-installing-ubuntu-18-04/

To use Gnome shell enable the browser extension and also the host extension.

Once you do that, you will see the toggle button for any gnome extension, right inside the browser Window.

You can also congfigure that in Tweaks application
 
 

Gradle Could not create service of type ScriptPluginFactory

Error

Could not create service of type ScriptPluginFactory using BuildScopeServices.createScriptPluginFactory().


Detailed exception

[jj@184fc3b978cc bigtop]$ ./gradlew clean

FAILURE: Build failed with an exception.

* What went wrong:
Could not create service of type ScriptPluginFactory using BuildScopeServices.createScriptPluginFactory().
> Could not create service of type CrossBuildFileHashCache using BuildSessionScopeServices.createCrossBuildFileHashCache().

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

* Get more help at https://help.gradle.org

BUILD FAILED in 0s

Solution

The directory where code was there was owned by root:root

Change the ownership back to your user and it should work

Clevo P570WM Ubuntu freeze problem

TL;DR

To get Ubuntu running on a Clevo laptop get kernel version 5.6.15+. Here are the steps to upgrade Kernel.

7 years back in 2013, I bought a P570WM laptop from Metabox, I did one big mistake that I invested money on something which was brand new in the market. My goal was to get some good Ubuntu laptop. I thought to go ahead with Clevo based laptop.

Below were the specs, they might not look impressive, but if you think these are 7 years old, then you will feel they were really good back then.

  • Screen type: 17.3" FHD 1920x1080 LED/LCD 
  • Graphics: Nvidia GTX 780M 4GB GDDR5 video graphics
  • Processor: i7-3970X 6-Core 3.5GHz - 4.0GHz 15MB Cache
  • RAM memory: 32GB DDR3 1600Mhz RAM 
  • Primary drive: 1TB 7200 rpm primary hard drive

I was very happy with the laptop arrived, but my nightmare began when I installed Ubuntu in it and it froze immediately on the boot.

Unfortunately, Metabox was of no help, they said they don't support Ubuntu and I was left alone with a massive waste of money of $5.5K machine of no use.

Fast forward 2020, that laptop was collecting dust in my cupboard and I thought to give it another shot and glad it worked. I am using Ubuntu 18.04 with Kernel 5.6.15 and no custom drivers for the Nvidia card.

The posts which gave me a hope to keep moving are mentioned below

https://edgcert.com/2019/06/03/ubuntu-on-clevo/
https://forum.manjaro.org/t/freezes-when-probing-system-on-clevo-n850hk1/60346
https://askubuntu.com/questions/1068161/clevo-n850el-crashes-freezes-ubuntu-18-04-1-frequently

Related Kernel bug
https://bugzilla.kernel.org/show_bug.cgi?id=109051