This post explains the directory structure of the Hadoop 0.20.2 version. Along with directory structure attempt has been made to explain the usage of files present in each of those directories.
A separate post for each of the directories explains the detailed purpose of usage of files present in them.
A similar kind of post is present for 0.23 version of Hadoop
bin
This is the main folder where all the shell script files like start-all.sh , stop-all.sh etc are present. The main hadoop binary is also present in this folder.
If you want to read in more detail , you can read this post on hadoop 0.20.2 bin folder files use and purpose.
c++
Contains the C++ libs for x86 and 64 bit architecture of Pipes etc. Hadoop Pipes allows C++ code to use the Hadoop
conf
This is the directory where all configuration files are stored. You specify here the set of slaves present in your cluster , kind of cluster configuration e.g standalone , pseudo mode , fully distributed in this folder
contrib
Set of community developed useful stuff which can be used. e.g thriftfs , hdfsproxy etc
docs
Documentation and API docs for Hadoop distribution
ivy
ivy , pom definition of hadoop code which is used while building it from the source.
lib
Set of third party libraries , jars used by Hadoop
librecordio
Some lib :)
logs
Hadoop daemon log files , it stores the logs for various daemons running in hadoop. This is the place to see when you have some error in hadoop
src
Source code for Hadoop distribution , there is one build.xml file also you can build the full hadoop using that
webapps
webapps contains the files which are used to show to web frontend of the hadoop namenode at localhost:500070 and 500030 etc ports which gives us complete picture about the hadoop daemons what is happening at this time , which job is under process. The servlets are defined in web.xml for each of them and they run on the top of jetty server
No comments:
Post a Comment
Please share your views and comments below.
Thank You.