Average function without overflow exception

Ron Klein picture Ron Klein · May 24, 2010 · Viewed 8.9k times · Source

.NET Framework 3.5.
I'm trying to calculate the average of some pretty large numbers.
For instance:

using System;
using System.Linq;

class Program
{
    static void Main(string[] args)
    {
        var items = new long[]
                        {
                            long.MaxValue - 100, 
                            long.MaxValue - 200, 
                            long.MaxValue - 300
                        };
        try
        {
            var avg = items.Average();
            Console.WriteLine(avg);
        }
        catch (OverflowException ex)
        {
            Console.WriteLine("can't calculate that!");
        }
        Console.ReadLine();
    }
}

Obviously, the mathematical result is 9223372036854775607 (long.MaxValue - 200), but I get an exception there. This is because the implementation (on my machine) to the Average extension method, as inspected by .NET Reflector is:

public static double Average(this IEnumerable<long> source)
{
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }
    long num = 0L;
    long num2 = 0L;
    foreach (long num3 in source)
    {
        num += num3;
        num2 += 1L;
    }
    if (num2 <= 0L)
    {
        throw Error.NoElements();
    }
    return (((double) num) / ((double) num2));
}

I know I can use a BigInt library (yes, I know that it is included in .NET Framework 4.0, but I'm tied to 3.5).

But I still wonder if there's a pretty straight forward implementation of calculating the average of integers without an external library. Do you happen to know about such implementation?

Thanks!!


UPDATE:

The previous example, of three large integers, was just an example to illustrate the overflow issue. The question is about calculating an average of any set of numbers which might sum to a large number that exceeds the type's max value. Sorry about this confusion. I also changed the question's title to avoid additional confusion.

Thanks all!!

Answer

Craig Gidney picture Craig Gidney · May 24, 2010

This answer used to suggest storing the quotient and remainder (mod count) separately. That solution is less space-efficient and more code-complex.

In order to accurately compute the average, you must keep track of the total. There is no way around this, unless you're willing to sacrifice accuracy. You can try to store the total in fancy ways, but ultimately you must be tracking it if the algorithm is correct.

For single-pass algorithms, this is easy to prove. Suppose you can't reconstruct the total of all preceding items, given the algorithm's entire state after processing those items. But wait, we can simulate the algorithm then receiving a series of 0 items until we finish off the sequence. Then we can multiply the result by the count and get the total. Contradiction. Therefore a single-pass algorithm must be tracking the total in some sense.

Therefore the simplest correct algorithm will just sum up the items and divide by the count. All you have to do is pick an integer type with enough space to store the total. Using a BigInteger guarantees no issues, so I suggest using that.

var total = BigInteger.Zero
var count = 0
for i in values
    count += 1
    total += i
return total / (double)count //warning: possible loss of accuracy, maybe return a Rational instead?