Given a mean and standard-deviation defining a normal distribution, how would you calculate the following probabilities in pure-Python (i.e. no Numpy/Scipy or other packages not in the standard library)?
I've found some libraries, like Pgnumerics, that provide functions for calculating these, but the underlying math is unclear to me.
Edit: To show this isn't homework, posted below is my working code for Python<=2.6, albeit I'm not sure if it handles the boundary conditions correctly.
from math import *
import unittest
def erfcc(x):
"""
Complementary error function.
"""
z = abs(x)
t = 1. / (1. + 0.5*z)
r = t * exp(-z*z-1.26551223+t*(1.00002368+t*(.37409196+
t*(.09678418+t*(-.18628806+t*(.27886807+
t*(-1.13520398+t*(1.48851587+t*(-.82215223+
t*.17087277)))))))))
if (x >= 0.):
return r
else:
return 2. - r
def normcdf(x, mu, sigma):
t = x-mu;
y = 0.5*erfcc(-t/(sigma*sqrt(2.0)));
if y>1.0:
y = 1.0;
return y
def normpdf(x, mu, sigma):
u = (x-mu)/abs(sigma)
y = (1/(sqrt(2*pi)*abs(sigma)))*exp(-u*u/2)
return y
def normdist(x, mu, sigma, f):
if f:
y = normcdf(x,mu,sigma)
else:
y = normpdf(x,mu,sigma)
return y
def normrange(x1, x2, mu, sigma, f=True):
"""
Calculates probability of random variable falling between two points.
"""
p1 = normdist(x1, mu, sigma, f)
p2 = normdist(x2, mu, sigma, f)
return abs(p1-p2)
All these are very similar: If you can compute #1 using a function cdf(x)
, then the solution to #2 is simply 1 - cdf(x)
, and for #3 it's cdf(x) - cdf(y)
.
Since Python includes the (gauss) error function built in since version 2.7 you can do this by calculating the cdf of the normal distribution using the equation from the article you linked to:
import math
print 0.5 * (1 + math.erf((x - mean)/math.sqrt(2 * standard_dev**2)))
where mean
is the mean and standard_dev
is the standard deviation.
Some notes since what you asked seemed relatively straightforward given the information in the article:
cdf(x)
. then 1 - cdf(x)
is the probability that the random variable X >= x. Since >= is equivalent for continuous random variables to >, this is also the probability X > x.