Using the gaussian probability density function in C++

shn picture shn · Jun 1, 2012 · Viewed 25.3k times · Source

First, is this the correct C++ representation of the pdf gaussian function ?

float pdf_gaussian = ( 1 / ( s * sqrt(2*M_PI) ) ) * exp( -0.5 * pow( (x-m)/s, 2.0 ) );

Second, does it make sense of we do something like this ?

if(pdf_gaussian < uniform_random())
   do something
else
   do other thing

EDIT: An example of what exactly are you trying to achieve:

Say I have a data called Y1. Then a new data called Xi arrive. I want to see if I should associated Xi to Y1 or if I should keep Xi as a new data data that will be called Y2. This is based on the distance between the new data Xi and the existing data Y1. If Xi is "far" from Y1 then Xi will not be associated to Y1, otherwise if it is "not far", it will be associated to Y1. Now I want to model this "far" or "not far" using a gaussian probability based on the mean and stdeviation of distances between Y and the data that where already associated to Y in the past.

Answer

Alexandre C. picture Alexandre C. · Jun 1, 2012

Technically,

float pdf_gaussian = ( 1 / ( s * sqrt(2*M_PI) ) ) * exp( -0.5 * pow( (x-m)/s, 2.0 ) );

is not incorrect, but can be improved.

First, 1 / sqrt(2 Pi) can be precomputed, and using pow with integers is not a good idea: it may use exp(2 * log x) or a routine specialized for floating point exponents instead of simply x * x.

Example better code:

float normal_pdf(float x, float m, float s)
{
    static const float inv_sqrt_2pi = 0.3989422804014327;
    float a = (x - m) / s;

    return inv_sqrt_2pi / s * std::exp(-0.5f * a * a);
}

You may want to make this a template instead of using float:

template <typename T>
T normal_pdf(T x, T m, T s)
{
    static const T inv_sqrt_2pi = 0.3989422804014327;
    T a = (x - m) / s;

    return inv_sqrt_2pi / s * std::exp(-T(0.5) * a * a);
}

this allows you to use normal_pdf on double arguments also (it is not that much more generic though). There are caveats with the last code, namely that you have to beware not using it with integers (there are workarounds, but this makes the routine more verbose).