I have two HDFS setup and want to copy (not migrate or move) some tables from HDFS1 to HDFS2. How to copy data from one HDFS to another HDFS? Is it possible via Sqoop or other command line?
DistCp (distributed copy) is a tool used for copying data between clusters. It uses MapReduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list.
Usage: $ hadoop distcp <src> <dst>
example: $ hadoop distcp hdfs://nn1:8020/file1 hdfs://nn2:8020/file2
file1
from nn1
is copied to nn2
with filename file2
Distcp is the best tool as of now. Sqoop is used to copy data from relational database to HDFS and vice versa, but not between HDFS to HDFS.
More info:
There are two versions available - runtime performance in distcp2
is more compared to distcp