Import export of Data to HDFS

Various tools and methods to import data to HDFS. Depending on what type of data and where it is located you can use following tools

Import from Database

Sqoop
http://sqoop.apache.org/
This tool can import data from various databases , custom connectos are also available for fast processing of import export of data.

Import of file based loads

Chukwa
http://incubator.apache.org/chukwa/

Scribe
https://github.com/facebook/scribe
Collect to one place and then push to HDFS

Flume
http://flume.apache.org/
Has verious sources and sink classes which can be used to push files to HDFS

HDFS File Slurper
https://github.com/alexholmes/hdfs-file-slurper
A basic tool to do import export

Regular tools
Use some automation tool like cron , autosys to push files to HDFS at some location
# hadoop fs -copyFromLocal src dest
# hadoop fs -copyToLocal src dest

Use oozie
Use oozie ssh action to login to machine and then execute the above two copy commands

Import export from HBase to HDFS

Use HBase export utility class
http://hbase.apache.org/book/ops_mgt.html#export

$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> 
 
Import to HBase
 
$ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
 
The HBase export utility writes data in sequence file format. So you need to do its conversion

You can use Mapreduce to read from HBase and write to HDFS in plain text or any other format
 
 

No comments:

Post a Comment

Please share your views and comments below.

Thank You.