How to normalize a histogram in python?

user40 picture user40 · Mar 7, 2014 · Viewed 83.2k times · Source

I'm trying to plot normed histogram, but instead of getting 1 as maximum value on y axis, I'm getting different numbers.

For array k=(1,4,3,1)

 import numpy as np

 def plotGraph():

    import matplotlib.pyplot as plt

    k=(1,4,3,1)

    plt.hist(k, normed=1)

    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis

    plt.show()  

plotGraph()

I get this histogram, that doesn't look like normed.

enter image description here

For a different array k=(3,3,3,3)

 import numpy as np

 def plotGraph():

    import matplotlib.pyplot as plt

    k=(3,3,3,3)

    plt.hist(k, normed=1)

    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis

    plt.show()  

plotGraph()

I get this histogram with max y-value is 10.

enter image description here

For different k I get different max value of y even though normed=1 or normed=True.

Why the normalization (if it works) changes based on the data and how can I make maximum value of y equals to 1?

UPDATE:

I am trying to implement Carsten König answer from plotting histograms whose bar heights sum to 1 in matplotlib and getting very weird result:

import numpy as np

def plotGraph():

    import matplotlib.pyplot as plt

    k=(1,4,3,1)

    weights = np.ones_like(k)/len(k)
    plt.hist(k, weights=weights)

    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis

    plt.show()  

plotGraph()

Result:

enter image description here

What am I doing wrong?

Thanks

Answer

CT Zhu picture CT Zhu · Mar 7, 2014

When you plot a normalized histogram, it is not the height that should sum up to one, but the area underneath the curve should sum up to one:

In [44]:

import matplotlib.pyplot as plt
k=(3,3,3,3)
x, bins, p=plt.hist(k, density=True)  # used to be normed=True in older versions
from numpy import *
plt.xticks( arange(10) ) # 10 ticks on x axis
plt.show()  
In [45]:

print bins
[ 2.5  2.6  2.7  2.8  2.9  3.   3.1  3.2  3.3  3.4  3.5]

Here, this example, the bin width is 0.1, the area underneath the curve sums up to one (0.1*10).

To have the sum of height to be 1, add the following before plt.show():

for item in p:
    item.set_height(item.get_height()/sum(x))

enter image description here