I copied some files from a directory to directory using
hadoop distcp -Dmapreduce.job.queuename=adhoc /user/comverse/data/$CURRENT_DATE_NO_DASH_*/*rcr.gz /apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin04/file_rcr/
I stopped the scipt before it finished and the remained a lot of .distcp.tmp.attempt
and files that fnished moving in the dst directory
Now I want to clean the dst directory. After running
hadoop fs -rm -skipTrash /apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin04/file_mta/*
most of the files were deleted, but some remained(at least that's what HUE shows). The strange thing is, every time I run hadoop fs -rm -skipTrash
, according to HUE, the number of remaining files changes to more or less.
I tried
hadoop fs -ls /apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin04/file_mta/
and saw that some of the files that should be deleted were still there. Then I run
hadoop fs -rm -skipTrash /apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin04/file_mta/*
a dozen more times and there were always more files to delete(There still are). What is happening?
ALSO
Each time I refresh the page in hue, the number of files grows. HALP.
EDIT
It seems that stopping distcp in the command line doesn't actually kill the job. That was the reason.
You could use this "-R":
This remove all the file from your hdfs location.
hadoop fs -rm -R -skipTrash /apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin04/file_mta/*