First, is this the correct C++ representation of the pdf gaussian function ?
float pdf_gaussian = ( 1 / ( s * sqrt(2*M_PI) ) ) * exp( -0.5 * pow( (x-m)/s, 2.0 ) );
Second, does it make sense of we do something like this ?
if(pdf_gaussian < uniform_random())
do something
else
do other thing
EDIT: An example of what exactly are you trying to achieve:
Say I have a data called Y1. Then a new data called Xi arrive. I want to see if I should associated Xi to Y1 or if I should keep Xi as a new data data that will be called Y2. This is based on the distance between the new data Xi and the existing data Y1. If Xi is "far" from Y1 then Xi will not be associated to Y1, otherwise if it is "not far", it will be associated to Y1. Now I want to model this "far" or "not far" using a gaussian probability based on the mean and stdeviation of distances between Y and the data that where already associated to Y in the past.
Technically,
float pdf_gaussian = ( 1 / ( s * sqrt(2*M_PI) ) ) * exp( -0.5 * pow( (x-m)/s, 2.0 ) );
is not incorrect, but can be improved.
First, 1 / sqrt(2 Pi)
can be precomputed, and using pow
with integers is not a good idea: it may use exp(2 * log x)
or a routine specialized for floating point exponents instead of simply x * x
.
Example better code:
float normal_pdf(float x, float m, float s)
{
static const float inv_sqrt_2pi = 0.3989422804014327;
float a = (x - m) / s;
return inv_sqrt_2pi / s * std::exp(-0.5f * a * a);
}
You may want to make this a template instead of using float
:
template <typename T>
T normal_pdf(T x, T m, T s)
{
static const T inv_sqrt_2pi = 0.3989422804014327;
T a = (x - m) / s;
return inv_sqrt_2pi / s * std::exp(-T(0.5) * a * a);
}
this allows you to use normal_pdf
on double
arguments also (it is not that much more generic though). There are caveats with the last code, namely that you have to beware not using it with integers (there are workarounds, but this makes the routine more verbose).