The instructions below can be used to install RHadoop rmr2 , rhdfs packages on Hadoop cluster. I have just single node cluster but it really don't matter. Same instructions apply if you have more machines.
Install R on machine by following the instructions at
http://jugnu-life.blogspot.com.au/2013/02/install-r-on-ubuntu_24.html
Lets start to install RHadoop
RHadoop packages are available at
https://github.com/RevolutionAnalytics
I have cloned the git repo for the packages , this makes easy to do any upgrades. So you have two choices here.
Easy 1 ) Download the tar.gz files for rmr2 , rhdfs
Easy 2 :) , Clone the git repo for each of them
Lets go with first one
Download the rmr2 and rhdfs from following locations
https://github.com/RevolutionAnalytics/rhdfs/blob/master/build/rhdfs_1.0.5.tar.gz
https://github.com/RevolutionAnalytics/rmr2/blob/master/build/rmr2_2.1.0.tar.gz
https://github.com/RevolutionAnalytics/quickcheck/blob/master/build/quickcheck_1.0.tar.gz
Links might have changed while you are reading this , so pardon me and get latest links
I assume you have already installed R on your machine , it needs to be installed on all nodes in your cluster. And these Rhadoop packages also needs to be installed on all of the nodes.
Export variables needed
Change the location below depending on your install of Hadoop
sudo gedit /etc/environment
Add the following
# Variable added for RHadoop Install
HADOOP_CMD=/home/jj/software/hadoop-1.0.4/bin/hadoop
HADOOP_CONF=/home/jj/software/hadoop-1.0.4/conf
HADOOP_STREAMING=/home/jj/software/hadoop-1.0.4/contrib/streaming/hadoop-streaming-1.0.4.jar
Tell R about Java
R at times is not able to figure out few java settings so lets tell and help R
# sudo R CMD javareconf JAVA=/home/jj/software/java/jdk1.6.0_43/bin/java JAVA_HOME=/home/jj/software/java/jdk1.6.0_43 JAVAC=/home/jj/software/java/jdk1.6.0_43/bin/javac JAR=/home/jj/software/java/jdk1.6.0_43/bin/jar JAVAH=/home/jj/software/java/jdk1.6.0_43/bin/javah
Please change the following
/home/jj/software/java/jdk1.6.0_43/bin
Depending on your Java location
$ sudo R CMD javareconf JAVA=/home/jj/software/java/jdk1.6.0_43/bin/java JAVA_HOME=/home/jj/software/java/jdk1.6.0_43 JAVAC=/home/jj/software/java/jdk1.6.0_43/bin/javac JAR=/home/jj/software/java/jdk1.6.0_43/bin/jar JAVAH=/home/jj/software/java/jdk1.6.0_43/bin/javah
Updating Java configuration in /usr/lib/R
Done.
Check cluster is up and happy
hadoop fs –ls /
jj@jj-VirtualBox:~/software/R/RHadoop$ hadoop fs -ls /
Warning: $HADOOP_HOME is deprecated.
Found 3 items
drwxr-xr-x - jj supergroup 0 2013-03-29 09:59 /hbase
drwxr-xr-x - jj supergroup 0 2013-03-09 21:57 /home
drwxr-xr-x - jj supergroup 0 2013-03-09 17:35 /user
Install RJava
Start R with
#sudo R –save
We are just telling R to start with sudo and save settings we do now
>install.packages('rJava')
It will ask to choose CRAN server , select something near to you and let the install happen
After its done verify that its there :)
> library()
It will show something like
Packages in library ‘/usr/local/lib/R/site-library’:
rJava Low-level R to Java interface
Packages in library ‘/usr/lib/R/library’:
Quit R
> q()
All set
Install rhdfs now
Go to location where you downloaded tar.gz files and execute following command
jj@jj-VirtualBox:~/software/R/RHadoop/rhdfs/build$ ls
rhdfs_1.0.5.tar.gz
$ sudo export HADOOP_CMD=/home/jj/software/hadoop-1.0.4/bin/hadoop R CMD INSTALL rhdfs_1.0.5.tar.gz
Check
> library('rhdfs')
Loading required package: rJava
HADOOP_CMD=/home/jj/software/hadoop-1.0.4/bin/hadoop
Be sure to run hdfs.init()
> hdfs.init()
> hdfs.ls('/')
permission owner group size modtime file
1 drwxr-xr-x jj supergroup 0 2013-03-29 09:59 /hbase
2 drwxr-xr-x jj supergroup 0 2013-03-09 21:57 /home
3 drwxr-xr-x jj supergroup 0 2013-03-09 17:35 /user
>
We are able to see HDFS files in R
So all done for rhdfs
Install rmr2
$ apt-get install -y pdfjam
> install.packages(c( 'RJSONIO', 'itertools', 'digest','functional', 'stringr', 'plyr'))
Download package from
http://cran.r-project.org/web/packages/reshape2/index.html
http://cran.r-project.org/web/packages/Rcpp/index.html
sudo R CMD INSTALL Rcpp_0.10.3.tar.gz
sudo R CMD INSTALL reshape2_1.2.2.tar.gz
sudo R CMD INSTALL quickcheck_1.0.tar.gz
sudo R CMD INSTALL rmr2_2.0.2.tar.gz
Done
More reading
http://bighadoop.wordpress.com/2013/02/25/r-and-hadoop-data-analysis-rhadoop/