Various tools and methods to import data to HDFS. Depending on what type of data and where it is located you can use following tools
Import from Database
Sqoop
http://sqoop.apache.org/
This tool can import data from various databases , custom connectos are also available for fast processing of import export of data.
Import of file based loads
Chukwa
http://incubator.apache.org/chukwa/
Scribe
https://github.com/facebook/scribe
Collect to one place and then push to HDFS
Flume
http://flume.apache.org/
Has verious sources and sink classes which can be used to push files to HDFS
HDFS File Slurper
https://github.com/alexholmes/hdfs-file-slurper
A basic tool to do import export
Regular tools
Use some automation tool like cron , autosys to push files to HDFS at some location
# hadoop fs -copyFromLocal src dest
# hadoop fs -copyToLocal src dest
Use oozie
Use oozie ssh action to login to machine and then execute the above two copy commands
Import export from HBase to HDFS
Use HBase export utility class
http://hbase.apache.org/book/ops_mgt.html#export
Import from Database
Sqoop
http://sqoop.apache.org/
This tool can import data from various databases , custom connectos are also available for fast processing of import export of data.
Import of file based loads
Chukwa
http://incubator.apache.org/chukwa/
Scribe
https://github.com/facebook/scribe
Collect to one place and then push to HDFS
Flume
http://flume.apache.org/
Has verious sources and sink classes which can be used to push files to HDFS
HDFS File Slurper
https://github.com/alexholmes/hdfs-file-slurper
A basic tool to do import export
Regular tools
Use some automation tool like cron , autosys to push files to HDFS at some location
# hadoop fs -copyFromLocal src dest
# hadoop fs -copyToLocal src dest
Use oozie
Use oozie ssh action to login to machine and then execute the above two copy commands
Import export from HBase to HDFS
Use HBase export utility class
http://hbase.apache.org/book/ops_mgt.html#export
$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir>
Import to HBase
$ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
The HBase export utility writes data in sequence file format. So you need to do its conversion
You can use Mapreduce to read from HBase and write to HDFS in plain text or any other format
No comments:
Post a Comment
Please share your views and comments below.
Thank You.