Anyone can tell me why we always use the gaussian distribution in Machine learning?

laotao picture laotao · Sep 27, 2012 · Viewed 12.6k times · Source

For example, we always assumed that the data or signal error is a Gaussian distribution? why?

Answer

Chris Taylor picture Chris Taylor · Sep 27, 2012

The answer you'll get from mathematically minded people is "because of the central limit theorem". This expresses the idea that when you take a bunch of random numbers from almost any distribution* and add them together, you will get something approximately normally distributed. The more numbers you add together, the more normally distributed it gets.

I can demonstrate this in Matlab/Octave. If I generate 1000 random numbers between 1 and 10 and plot a histogram, I get something like this

enter image description here

If instead of generating a single random number, I generate 12 of them and add them together, and do this 1000 times and plot a histogram, I get something like this:

enter image description here

I've plotted a normal distribution with the same mean and variance over the top, so you can get an idea of how close the match is. You can see the code I used to generate these plots at this gist.

In a typical machine learning problem you will have errors from many different sources (e.g. measurement error, data entry error, classification error, data corruption...) and it's not completely unreasonable to think that the combined effect of all of these errors is approximately normal (although of course, you should always check!)

More pragmatic answers to the question include:

  • Because it makes the math simpler. The probability density function for the normal distribution is an exponential of a quadratic. Taking the logarithm (as you often do, because you want to maximize the log likelihood) gives you a quadratic. Differentiating this (to find the maximum) gives you a set of linear equations, which are easy to solve analytically.

  • It's simple - the entire distribution is described by two numbers, the mean and variance.

  • It's familiar to most people who will be reading your code/paper/report.

It's generally a good starting point. If you find that your distributional assumptions are giving you poor performance, then maybe you can try a different distribution. But you should probably look at other ways to improve the model's performance first.

*Technical point - it needs to have finite variance.