Using R and Hadoop Bigdata

Following packages helps working with Bigdata from R.

1. rmr2
2. rhdfs
3. HadoopStreamingR
4. Rhipe
5. h2o
6. SparkR

The links to documentation and tutorials for each of them are below.

All the packages work on the basis of Hadoop Streaming to run the work on cluster instead of single R node. If you are new to Hadoop read the basics of Hadoop Streaming on https://hadoop.apache.org/docs/stable2/hadoop-streaming/HadoopStreaming.html and short tutorial on writing jobs which run using Python. http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

Package Name

Useful Tutorials / Readings

rmr2

Web links



https://github.com/RevolutionAnalytics/rmr2/blob/master/docs/tutorial.md



Book

R in Nutshell 2nd edition ( Chapter 26 )

http://shop.oreilly.com/product/0636920022008.do




rhdfs

Wiki

https://github.com/RevolutionAnalytics/RHadoop/wiki/user%3Erhdfs%3EHome


HadoopStreamingR


Cran package documentation

https://cran.r-project.org/web/packages/HadoopStreaming/HadoopStreaming.pdf


Rhipe


Web links

http://tessera.io/docs-RHIPE/#install-and-push




h2o

Documentation using h2o from R

http://h2o-release.s3.amazonaws.com/h2o/rel-slater/1/docs-website/h2o-docs/index.html#%E2%80%A6%20From%20R

R h2o package documentation ( ~140 pages )

http://h2o-release.s3.amazonaws.com/h2o/rel-slater/1/docs-website/h2o-r/h2o_package.pdf


SparkR

Api

https://spark.apache.org/docs/latest/api/R/index.html

Documentation

https://spark.apache.org/docs/latest/api/R/index.html

No comments:

Post a Comment

Please share your views and comments below.

Thank You.