I've got a job running on my server at the command line prompt for a two days now:
find data/ -name filepattern-*2009* -exec tar uf 2009.tar {} ;
It is taking forever, and then some. Yes, there are millions of files in the target directory. (Each file is a measly 8 bytes in a well hashed directory structure.) But just running...
find data/ -name filepattern-*2009* -print > filesOfInterest.txt
...takes only two hours or so. At the rate my job is running, it won't be finished for a couple of weeks.. That seems unreasonable. Is there a more efficient to do this? Maybe with a more complicated bash script?
A secondary questions is "why is my current approach so slow?"
One option is to use cpio to generate a tar-format archive:
$ find data/ -name "filepattern-*2009*" | cpio -ov --format=ustar > 2009.tar
cpio works natively with a list of filenames from stdin, rather than a top-level directory, which makes it an ideal tool for this situation.