How to overwrite/reuse the existing output path for Hadoop jobs again and agian

hadoop rewrite fileoutputstream

yogesh · Oct 10, 2011 · Viewed 33.2k times · Source

I want to overwrite/reuse the existing output directory when I run my Hadoop job daily. Actually the output directory will store summarized output of each day's job run results. If I specify the same output directory it gives the error "output directory already exists".

How to bypass this validation?

Answer

What about deleting the directory before you run the job?

You can do this via shell:

hadoop fs -rmr /path/to/your/output/

or via the Java API:

// configuration should contain reference to your namenode
FileSystem fs = FileSystem.get(new Configuration());
// true stands for recursively deleting the folder you gave
fs.delete(new Path("/path/to/your/output"), true);

How to overwrite/reuse the existing output path for Hadoop jobs again and agian

Answer

Related questions