I'm trying to create a directory and copy a file (pdf) inside a Parallel.ForEach
.
Below is a simple example:
private static void CreateFolderAndCopyFile(int index)
{
const string sourcePdfPath = "c:\\testdata\\test.pdf";
const string rootPath = "c:\\testdata";
string folderDirName = string.Format("Data{0}", string.Format("{0:00000000}", index));
string folderDirPath = rootPath + @"\" + folderDirName;
Directory.CreateDirectory(folderDirPath);
string desPdfPath = folderDirPath + @"\" + "test.pdf";
File.Copy(sourcePdfPath, desPdfPath, true);
}
The method above creates a new folder and copies the pdf file to a new folder. It creates this dir tree:
TESTDATA
-Data00000000
-test.pdf
-Data00000001
-test.pdf
....
-Data0000000N
-test.pdf
I tried calling the CreateFolderAndCopyFile
method in a Parallel.ForEach
loop.
private static void Func<T>(IEnumerable<T> docs)
{
int index = 0;
Parallel.ForEach(docs, doc =>
{
CreateFolderAndCopyFile(index);
index++;
});
}
When I run this code it finishes with the following error:
The process cannot access the file 'c:\testdata\Data00001102\test.pdf' because it is being used by another process.
But first it created 1111 new folders and copied test.pdf about 1111 times before I got this error.
What caused this behaviour and how can it be resolved?
EDITED:
Code above was toy sample, sorry for hard coded strings Conclusion: Parallel method is slow.
Tomorrow I try some methods from How to write super-fast file-streaming code in C#?.
especially: http://designingefficientsoftware.wordpress.com/2011/03/03/efficient-file-io-from-csharp/
You are not synchronizing access to index
and that means you have a race on it. That's why you have the error. For illustrative purposes, you can avoid the race and keep this particular design by using Interlocked.Increment
.
private static void Func<T>(IEnumerable<T> docs)
{
int index = -1;
Parallel.ForEach(
docs, doc =>
{
int nextIndex = Interlocked.Increment(index);
CreateFolderAndCopyFile(nextIndex);
}
);
}
However, as others suggest, the alternative overload of ForEach
that provides a loop index is clearly a cleaner solution to this particular problem.
But when you get it working you will find that copying files is IO bound rather than processor bound and I predict that the parallel code will be slower than the serial code.