Optimizing Aggregate for String Concatenation

Daniel Earwicker picture Daniel Earwicker · Dec 10, 2008 · Viewed 21.7k times · Source

Update - for those of a facetious frame of mind, you can assume that Aggregate still produces the normal result whatever function is passed to it, including in the case being optimized.

I wrote this program to build a long string of integers from 0 to 19999 separate by commas.

using System;
using System.Linq;
using System.Diagnostics;

namespace ConsoleApplication5
{
    class Program
    {
        static void Main(string[] args)
        {
            const int size = 20000;

            Stopwatch stopwatch = new Stopwatch();

            stopwatch.Start();
            Enumerable.Range(0, size).Select(n => n.ToString()).Aggregate((a, b) => a + ", " + b);
            stopwatch.Stop();

            Console.WriteLine(stopwatch.ElapsedMilliseconds + "ms");
        }
    }
}

When I run it, it says:

5116ms

Over five seconds, terrible. Of course it's because the whole string is being copied each time around the loop.

But what if make one very small change indicated by the comment?

using System;
using System.Linq;
using System.Diagnostics;

namespace ConsoleApplication5
{
    using MakeAggregateGoFaster;  // <---- inserted this

    class Program
    {
        static void Main(string[] args)
        {
            const int size = 20000;

            Stopwatch stopwatch = new Stopwatch();

            stopwatch.Start();
            Enumerable.Range(0, size).Select(n => n.ToString()).Aggregate((a, b) => a + ", " + b);
            stopwatch.Stop();

            Console.WriteLine(stopwatch.ElapsedMilliseconds + "ms");
        }
    }
}

Now when I run it, it says:

42ms

Over 100x faster.

Question

What's in the MakeAggregateGoFaster namespace?

Update 2: Wrote up my answer here.

Answer

davesem picture davesem · Jan 14, 2009

Why not use one of the other forms of Aggregate?

Enumerable.Range(0, size ).Aggregate(new StringBuilder(),
        (a, b) => a.Append(", " + b.ToString()),
        (a) => a.Remove(0,2).ToString());

You can specify any type for your seed, perform whatever formatting or custom calls are needed in the first lambda function and then customize the output type in the second lambda function. The built in features already provide the flexibility you need. My runs went from 1444ms to 6ms.