Parallel.ForEach issues - Read Write On A File - File Is In Use Error

SilverLight picture SilverLight · Sep 24, 2012 · Viewed 11.2k times · Source

I am really really confused about Parallel.ForEach... How does it work?
The code below has an error -> File Is In Use

Parallel.ForEach(list_lines_acc, (line_acc, list_lines_acc_state) =>
{
     FileStream file = 
         new FileStream(GPLfilePath, FileMode.Open, FileAccess.ReadWrite);
     StreamReader reader = new StreamReader(file);
     var processed = string.Empty;
     Ok_ip_port = string.Empty;
     while (reader.EndOfStream)
     {
         if (string.IsNullOrEmpty(Ok_ip_port))
         {
             Ok_ip_port = reader.ReadLine();
         }
         else
         {
             string currentLine = reader.ReadLine();
             processed += currentLine + Environment.NewLine;
         }
     }
     StreamWriter writer = new StreamWriter(file);
     writer.Write(processed);

     reader.Close();
     writer.Close();
     file.Close();
});  

Would you please show me how can I fix that? This code is just an example.

I want to work with string arrays & Lists inside Parallel.ForEach, but there is always a problem for adding or editing those collections. Can you please provide an example? I am using Visual Studio 2010 + .NET Framework 4.0

Answer

Reed Copsey picture Reed Copsey · Sep 24, 2012

In your code, as written, each thread is using the same file, and effectively trying to append to it. Even if this could work, you would have a bad race condition (as the threads would be trying to append to the same file simultaneously).

The error you're seeing is purely because you're using the same file in each loop iteration, so when you try to open the file (after the first iteration), it's erroring out as it's opened by a different loop iteration.

Also, you're never using your loop variable (line_acc), so there is really no need for a loop here at all. This could be written without the Parallel.ForEach, and you have the same result, with no issues.

That being said - if this is example code, you'll tend to find that loops that are bound purely by file I/O will tend to not parallelize well. The actual drive being used will become the limiting factor, so running code that purely reads and writes to a file in parallel will often cause the resulting code to run slower, not faster, than running it sequentially.

I want to work with string arrays & Lists inside Parallel.ForEach, but there is always a problem for adding or editing those collections

The code you're showing "as an example" is doing none of this, so it's difficult to see where your issue might be occurring. You can write to an array or List<T> by index, but you can't add to a list in a parallel loop without extra synchronization (such as a lock), as List<T> is not thread safe for writes. If you are trying to read and write from collections, you might consider looking at the System.Collections.Concurrent namespace, which contains thread safe collections you can safely use in Parallel.ForEach loops.