I have a process that's going to initially generate 3-4 million PDF files, and continue at the rate of 80K/day. They'll be pretty small (50K) each, but what I'm worried about is how to manage the total mass of files I'm generating for easy lookup. Some details:
- I'll have some other steps to run once a file have been generated, and there will be a few servers participating, so I'll need to watch for files as they're generated.
- Once generated, the files will be available though a lookup process I've written. Essentially, I'll need to pull them based on an order number, which is unique per file.
- At any time, an existing order number may be resubmitted, and the generated file will need to overwrite the original copy.
Originally, I had planned to write these files all to a single directory on a NAS, but I realize this might not be a good idea, since there are millions of them and Windows might not handle a million-file-lookup very gracefully. I'm looking for some advice:
- Is a single folder okay? The files will never be listed - they'll only be retrieved using a System.IO.File with a filename I've already determined.
- If I do a folder, can I watch for new files with a System.IO.DirectoryWatcher, even with that many files, or will it start to become sluggish with that many files?
- Should they be stored as BLOBs in a SQL Server database instead? Since I'll need to retrieve them by a reference value, maybe this makes more sense.
Thank you for your thoughts!