How to correctly write to a file using Parallel.ForEach?

justindao picture justindao · Feb 12, 2016 · Viewed 11.9k times · Source

I have a task which reads a large file line by line, does some logic with it, and returns a string I need to write to a file. The order of the output does not matter. However, when I try the code below, it stops/get really slow after reading 15-20k lines of my file.

public static Object FileLock = new Object();
...
Parallel.ForEach(System.IO.File.ReadLines(inputFile), (line, _, lineNumber) =>
{
    var output = MyComplexMethodReturnsAString(line);
    lock (FileLock)
    {
        using (var file = System.IO.File.AppendText(outputFile))
        {
            file.WriteLine(output);
        }
    }
});

Why is my program slow down after some time running? Is there a more correct way to perform this task?

Answer

Jeff Mercado picture Jeff Mercado · Feb 12, 2016

You've essentially serialized your query by having all threads try to write to the file. Instead, you should calculate what needs to be written then write them as they come at the end.

var processedLines = File.ReadLines(inputFile).AsParallel()
    .Select(l => MyComplexMethodReturnsAString(l));
File.AppendAllLines(outputFile, processedLines);

If you need to flush the data as it comes, open a stream and enable auto flushing (or flush manually):

var processedLines = File.ReadLines(inputFile).AsParallel()
    .Select(l => MyComplexMethodReturnsAString(l));
using (var output = File.AppendText(outputFile))
{
    output.AutoFlush = true;
    foreach (var processedLine in processedLines)
        output.WriteLine(processedLine);
}