We had to take full copy of HBase from one cluster to another.
We decided to take brute force approach of copying via Distcp.
Although it’s not recommended but we took it as time was very less and we knew it works very quick.
This assumes that you don’t have any tables on destination side. If you have then you need to backup them first.
Stop HBase on both sides this will ensure all the data in memory will be dumped to local disks
Start the distcp to copy data
Commands executed on clusterA side
# Create directory on destination side
sudo –u hdfs hadoop fs –mkdir hdfs://clusterB/hbase_copy_20130320
# Start distcp job
sudo –u hdfs hadoop distcp –update /hbase hdfs://clusterB/hbase_copy_20130320
Commands executed on destination side
Verify that data size matches on both sides
hadoop fs -du -h /hbase
hadoop fs -du -h /hbase_copy_20130320
Commands executed on destination side clusterB
sudo -u hdfs hadoop fs -chown -R hbase:hbase /hbase_copy_20130320
sudo -u hdfs hadoop fs -mv /hbase /hbase_clusterB_backup
sudo -u hdfs hadoop fs -mv /hbase_clusterB_backup /hbase
Do meta repair
sudo -u hbase hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -base hdfs://clusterB_NamenodeService/hbase
This will take some time , once it’s done.
Restart HBase on destination side and let region balancing happens
you can verify data by
Lastly , the recommended approach is snapshots and copytable. But today we did not use these. I will write another post to use snapshots and copytable
Thanks for reading