How to overwrite/reuse the existing output path for Hadoop jobs again and agian

yogesh picture yogesh · Oct 10, 2011 · Viewed 33.2k times · Source

I want to overwrite/reuse the existing output directory when I run my Hadoop job daily. Actually the output directory will store summarized output of each day's job run results. If I specify the same output directory it gives the error "output directory already exists".

How to bypass this validation?

Answer

Thomas Jungblut picture Thomas Jungblut · Oct 10, 2011

What about deleting the directory before you run the job?

You can do this via shell:

hadoop fs -rmr /path/to/your/output/

or via the Java API:

// configuration should contain reference to your namenode
FileSystem fs = FileSystem.get(new Configuration());
// true stands for recursively deleting the folder you gave
fs.delete(new Path("/path/to/your/output"), true);