Sqoop is a tool which is used to import / export data from RDBMS to HDFS
It can be downloaded from the apache website. As of writing this post the Sqoop is in incubation project with apache , but it would come as full project in the near future.
Sqoop is a client tool , you are not required to install it to all nodes of Cluster. The best practice is to just install it on client ( or edge node of the cluster) . The data transfer is direct between Cluster and Database , incase you are worried for traffic between machine where you install Sqoop and Database.
Installation steps
The installation is fairly simple to start off for development purpose with Sqoop
Download the latest sqoop binary file
Extract it in some folder
Specify the SQOOP_HOME and add Sqoop path variable so that we can directly run the sqoop commands
For example i downloaded sqoop in following directory and my environment variables look like this
export SQOOP_HOME="/home/hadoop/software/sqoop-1.4.3"
export PATH=$PATH:$SQOOP_HOME/bin
Sqoop can be connected to various types of databases .
For example it can talk to mysql , Oracle , Postgress databases. It uses JDBC to connect to them. JDBC driver for each of databases is needed by sqoop to connect to them.
JDBC driver jar for each of the database can be downloaded from net. For example mysql jar is present at link below
Download the mysql j connector jar and store in lib directory present in sqoop home folder.
Thats it.
Just test your installation by typing
$ sqoop help
You should see the list of commands with there use in sqoop
Happy sqooping :)
Thanks for the post. Helpful
ReplyDeleteis Sqoop should be install on the development machine ? or on the hadoop node ! ?
ReplyDeleteHow can I run it under windows
Sqoop is a client software , no need to run on Hadoop node.
ReplyDeleteYou can install it on some machine and setup configuration settings to send data to cluster.
I can't make Sqoop work on Windows.
ReplyDeleteCould you please provide steps on how to setup it on Windows? Thanks!