How to copy data from one HDFS to another HDFS?

sharp picture sharp · Aug 6, 2015 · Viewed 99.5k times · Source

I have two HDFS setup and want to copy (not migrate or move) some tables from HDFS1 to HDFS2. How to copy data from one HDFS to another HDFS? Is it possible via Sqoop or other command line?

Answer

Avinash Reddy picture Avinash Reddy · Aug 7, 2015

DistCp (distributed copy) is a tool used for copying data between clusters. It uses MapReduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list.

Usage: $ hadoop distcp <src> <dst>

example: $ hadoop distcp hdfs://nn1:8020/file1 hdfs://nn2:8020/file2

file1 from nn1 is copied to nn2 with filename file2

Distcp is the best tool as of now. Sqoop is used to copy data from relational database to HDFS and vice versa, but not between HDFS to HDFS.

More info:

There are two versions available - runtime performance in distcp2 is more compared to distcp