Efficient way to compute geometric mean of many numbers

Question 1

Efficient way to compute geometric mean of many numbers

c++ c algorithm numerical underflow

Walter · Nov 14, 2013 · Viewed 7.8k times · Source

Answer

Answer

The "split exponent and mantissa" solution:

double geometric_mean(std::vector<double> const & data)
{
    double m = 1.0;
    long long ex = 0;
    double invN = 1.0 / data.size();

    for (double x : data)
    {
        int i;
        double f1 = std::frexp(x,&i);
        m*=f1;
        ex+=i;
    }

    return std::pow( std::numeric_limits<double>::radix,ex * invN) * std::pow(m,invN);
}

~~If you are concerned that ex might overflow you can define it as a double instead of a long long, and multiply by invN at every step, but you might lose a lot of precision with this approach.~~

EDIT For large inputs, we can split the computation in several buckets:

double geometric_mean(std::vector<double> const & data)
{
    long long ex = 0;
    auto do_bucket = [&data,&ex](int first,int last) -> double
    {
        double ans = 1.0;
        for ( ;first != last;++first)
        {
            int i;
            ans *= std::frexp(data[first],&i);
            ex+=i;
        }
        return ans;
    };

    const int bucket_size = -std::log2( std::numeric_limits<double>::min() );
    std::size_t buckets = data.size() / bucket_size;

    double invN = 1.0 / data.size();
    double m = 1.0;

    for (std::size_t i = 0;i < buckets;++i)
        m *= std::pow( do_bucket(i * bucket_size,(i+1) * bucket_size),invN );

    m*= std::pow( do_bucket( buckets * bucket_size, data.size() ),invN );

    return std::pow( std::numeric_limits<double>::radix,ex * invN ) * m;
}

Question 2

I need to compute the geometric mean of a large set of numbers, whose values are not a priori limited. The naive way would be

double geometric_mean(std::vector<double> const&data) // failure
{
  auto product = 1.0;
  for(auto x:data) product *= x;
  return std::pow(product,1.0/data.size());
}

However, this may well fail because of underflow or overflow in the accumulated product (note: long double doesn't really avoid this problem). So, the next option is to sum-up the logarithms:

double geometric_mean(std::vector<double> const&data)
{
  auto sumlog = 0.0;
  for(auto x:data) sum_log += std::log(x);
  return std::exp(sum_log/data.size());
}

This works, but calls std::log() for every element, which is potentially slow. Can I avoid that? For example by keeping track of (the equivalent of) the exponent and the mantissa of the accumulated product separately?

Efficient way to compute geometric mean of many numbers

Answer

Related questions