I want to overwrite/reuse the existing output directory when I run my Hadoop job daily. Actually the output directory will store summarized output of each day's job run results. If I specify the same output directory it gives the error "output directory already exists".
How to bypass this validation?
What about deleting the directory before you run the job?
You can do this via shell:
hadoop fs -rmr /path/to/your/output/
or via the Java API:
// configuration should contain reference to your namenode
FileSystem fs = FileSystem.get(new Configuration());
// true stands for recursively deleting the folder you gave
fs.delete(new Path("/path/to/your/output"), true);