How to plot a probability mass function in python

kmosley picture kmosley · Oct 21, 2013 · Viewed 12.3k times · Source

How can I create a histogram that shows the probability distribution given an array of numbers x ranging from 0-1? I expect each bar to be <= 1 and that if I sum the y values of every bar they should add up to 1.

For example, if x=[.2, .2, .8] then I would expect a graph showing 2 bars, one at .2 with height .66, one at .8 with height .33.

I've tried:

matplotlib.pyplot.hist(x, bins=50, normed=True)

which gives me a histogram with bars that go above 1. I'm not saying that's wrong since that's what the normed parameter will do according to documentation, but that doesn't show the probabilities.

I've also tried:

counts, bins = numpy.histogram(x, bins=50, density=True)
bins = bins[:-1] + (bins[1] - bins[0])/2
matplotlib.pyplot.bar(bins, counts, 1.0/50)

which also gives me bars whose y values sum to greater than 1.

Answer

kmosley picture kmosley · Oct 23, 2013

I think my original terminology was off. I have an array of continuous values [0-1) which I want to discretize and use to plot a probability mass function. I thought this might be common enough to warrant a single method to do it.

Here's the code:

x = [random.random() for r in xrange(1000)]
num_bins = 50
counts, bins = np.histogram(x, bins=num_bins)
bins = bins[:-1] + (bins[1] - bins[0])/2
probs = counts/float(counts.sum())
print probs.sum() # 1.0
plt.bar(bins, probs, 1.0/num_bins)
plt.show()