HBase transfer to another cluster using distcp

We had to take full copy of HBase from one cluster to another.

We decided to take brute force approach of copying via Distcp.
Although it’s not recommended but we took it as time was very less and we knew it works very quick.

clusterA

clusterB


Steps

This assumes that you don’t have any tables on destination side. If you have then you need to backup them first.

Stop HBase on both sides this will ensure all the data in memory will be dumped to local disks

Start the distcp to copy data

Commands executed on clusterA side

# Create directory on destination side

sudo –u hdfs hadoop fs –mkdir hdfs://clusterB/hbase_copy_20130320

# Start distcp job
sudo –u hdfs hadoop distcp –update  /hbase  hdfs://clusterB/hbase_copy_20130320

Commands executed on destination side

Verify that data size matches on both sides

clusterA
hadoop fs -du -h /hbase
clusterB
hadoop fs -du -h /hbase_copy_20130320

Commands executed on destination side clusterB

sudo -u hdfs hadoop fs -chown -R hbase:hbase /hbase_copy_20130320
sudo -u hdfs hadoop fs -mv /hbase /hbase_clusterB_backup
sudo -u hdfs hadoop fs -mv /hbase_clusterB_backup /hbase

Do meta repair

sudo -u hbase hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -base hdfs://clusterB_NamenodeService/hbase

This will take some time , once it’s done.

Restart HBase on destination side and let region balancing happens

you can verify data by

hbase list

Lastly , the recommended approach is snapshots and copytable. But today we did not use these. I will write another post to use snapshots and copytable

Thanks for reading

No comments:

Post a Comment

Please share your views and comments below.

Thank You.