Efficient way to write a lot of lines to a text file

lintmouse picture lintmouse · Jun 26, 2013 · Viewed 19.4k times · Source

I started off doing something as follows:

using (TextWriter textWriter = new StreamWriter(filePath, append))
{
    foreach (MyClassA myClassA in myClassAs)
    {
        textWriter.WriteLine(myIO.GetCharArray(myClassA));

        if (myClassA.MyClassBs != null)
            myClassA.MyClassBs.ToList()
                .ForEach(myClassB =>
                    textWriter.WriteLine(myIO.GetCharArray((myClassB)));

        if (myClassA.MyClassCs != null)
            myClassA.MyClassCs.ToList()
                .ForEach(myClassC =>
                    textWriter.WriteLine(myIO.GetCharArray(myClassC)));
    }
}

This seemed pretty slow (~35 seconds for 35,000 lines).

Then I tried to follow the example here to create a buffer, with the following code, but it didn't gain me anything. I was still seeing times around 35 seconds. Is there an error in how I implemented the buffer?

using (TextWriter textWriter = new StreamWriter(filePath, append))
{
    char[] newLineChars = Environment.NewLine.ToCharArray();
    //Chunk through 10 lines at a time.
    int bufferSize = 500 * (RECORD_SIZE + newLineChars.Count());
    char[] buffer = new char[bufferSize];
    int recordLineSize = RECORD_SIZE + newLineChars.Count();
    int bufferIndex = 0;

    foreach (MyClassA myClassA in myClassAs)
    {
        IEnumerable<IMyClass> myClasses =
            new List<IMyClass> { myClassA }
                .Union(myClassA.MyClassBs)
                .Union(myClassA.MyClassCs);

        foreach (IMyClass myClass in myClasses)
        {
            Array.Copy(myIO.GetCharArray(myClass).Concat(newLineChars).ToArray(),
                0, buffer, bufferIndex, recordLineSize);

            bufferIndex += recordLineSize;

            if (bufferIndex >= bufferSize)
            {
                textWriter.Write(buffer);

                bufferIndex = 0;
            }
        }
    }

    if (bufferIndex > 0)
        textWriter.Write(buffer);
}

Is there a better way to accomplish this?

Answer

Jim Mischel picture Jim Mischel · Jun 27, 2013

I strongly suspect that the majority of your time is not spent in the I/O. There's no way that it should take 35 seconds to write 35,000 lines, unless those lines are really long.

Most likely, the majority of time is spent in the GetCharArray method, whatever that does.

A few suggestions:

If you really think I/O is the problem, increase the stream's buffer size. Call the StreamWriter constructor that lets you specify a buffer size. For example,

using (TextWriter textWriter = new StreamWriter(filePath, append, Encoding.Utf8, 65536))

That'll perform better than the default 4K buffer size. Going higher than 64K for the buffer size is not generally useful, and can actually decrease performance.

Don't pre-buffer lines or append to a StringBuilder. That might give you small performance increases, but at a huge cost in complexity. The small performance boost isn't worth the maintenance nightmare.

Take advantage of foreach. You have this code:

if (myClassA.MyClassBs != null)
    myClassA.MyClassBs.ToList()
        .ForEach(myClassB =>
            textWriter.WriteLine(myIO.GetCharArray((myClassB)));

That has to create a concrete list from whatever MyClassBs collection is, and then enumerate it. Why not just enumerate the thing directly:

if (myClassA.MyClassBs != null)
{
    foreach (var myClassB in myClassA.MyClassBs)
    {
        textWriter.WriteLine(myIO.GetCharArray((myClassB)));
    }
}

That will save you the memory required by the ToList, and the time it takes to enumerate the collection when creating the list.

All that said, it's almost certain that your GetCharArray method is the thing that's taking all the time. If you really want to speed up your program, look there. Trying to optimize writing to the StreamWriter is a waste of time. You're not going to get significant performance increases there.