In my previous post i tried to write overview of folder structure of Hadoop 0.20.2.
In this post the files and their purpose is discussed in bit more detail.
A similar kind of post exists for Hadoop 0.23 version if you are interested to read.
hadoop-config.sh
This is executed with each hadoop command , it does two things mainly. If we specify –conf paramter then it sets the HADOOP_CONF_DIR parameter to the same , secondly if we set parameter like –hosts then it decides to use master or slaves file
hadoop
This is the main hadoop command script file. If we just write on the command prompt just
# hadoop
It shows the list of commands which can use , it shows detailed help. The screen shot below also shows the same.
Now when we type one of the commands from help above it calls some Class file written in the file above.
e,g hadoop namenode -format
if [ "$COMMAND" = "namenode" ] ; then
CLASS='org.apache.hadoop.hdfs.server.namenode.NameNode'
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NAMENODE_OPTS"
So when we call namenode it calls NameNode.java file mentioned above and pass the arguments further. Similarly for other hadoop command
This file also calls
conf/hadoop-env.sh where various hadoop specific environment variables and configurations are set. The details about that can be read in other blog post for conf directory
hadoop-env.sh can be used to pass daemon specific environment variables.
hadoop-daemon.sh
Runs a Hadoop command as a daemon
hadoop-daemons.sh
Run a Hadoop command on all slave hosts. It calls slaves.sh and then calls hadoop-daemon.sh to run particular command passed
slaves.sh
Run a shell command on all slave hosts
start-dfs.sh
Start hadoop dfs daemons. Optionally we can also tell to upgrade or rollback dfs state. We run this on master node.
The implementation is
# start dfs daemons
# start namenode after datanodes, to minimize time namenode is up w/o data
# note: datanodes will log connection errors until namenode starts
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode $nameStartOpt
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode $dataStartOpt
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters start secondarynamenodenameStartOpt and dataStartOpt are decided on parameters we pass
start-mapred.sh
Start hadoop map reduce daemons. Run this on master node.
# start mapred daemons
# start jobtracker first to minimize connection errors at startup
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start tasktracker
start-all.sh
Start all hadoop daemons. Run this on master node.
It calls start-dfs.sh and start-mapred.sh internally
stop-dfs.sh
Stop hadoop DFS daemons. Run this on master node.
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR stop namenode
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR stop datanode
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters stop secondarynamenode
stop-mapred.sh
Stop hadoop map reduce daemons. Run this on master node.
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR stop jobtracker
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR stop tasktracker
stop-all.sh
Stop all hadoop daemons. Run this on master node.
"$bin"/stop-mapred.sh --config $HADOOP_CONF_DIR
"$bin"/stop-dfs.sh --config $HADOOP_CONF_DIR
start-balancer.sh
Start balancer daemon at particular node for data block balancing
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start balancer $@
stop-balancer.sh
# Stop balancer daemon.
# Run this on the machine where the balancer is running"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR stop balancer
rcc.sh
The Hadoop record compiler
The implementation for this is present in
CLASS='org.apache.hadoop.record.compiler.generated.Rcc'
No comments:
Post a Comment
Please share your views and comments below.
Thank You.