merge output files after reduce phase

Shahryar picture Shahryar · Apr 18, 2011 · Viewed 74.4k times · Source

In mapreduce each reduce task write its output to a file named part-r-nnnnn where nnnnn is a partition ID associated with the reduce task. Does map/reduce merge these files? If yes, how?

Answer

diliop picture diliop · Apr 21, 2011

Instead of doing the file merging on your own, you can delegate the entire merging of the reduce output files by calling:

hadoop fs -getmerge /output/dir/on/hdfs/ /desired/local/output/file.txt

Note This combines the HDFS files locally. Make sure you have enough disk space before running