Plotting a histogram from pre-counted data in Matplotlib

Josh Rosen picture Josh Rosen · Oct 6, 2013 · Viewed 23.6k times · Source

I'd like to use Matplotlib to plot a histogram over data that's been pre-counted. For example, say I have the raw data

data = [1, 2, 2, 3, 4, 5, 5, 5, 5, 6, 10]

Given this data, I can use

pylab.hist(data, bins=[...])

to plot a histogram.

In my case, the data has been pre-counted and is represented as a dictionary:

counted_data = {1: 1, 2: 2, 3: 1, 4: 1, 5: 4, 6: 1, 10: 1}

Ideally, I'd like to pass this pre-counted data to a histogram function that lets me control the bin widths, plot range, etc, as if I had passed it the raw data. As a workaround, I'm expanding my counts into the raw data:

data = list(chain.from_iterable(repeat(value, count)
            for (value, count) in counted_data.iteritems()))

This is inefficient when counted_data contains counts for millions of data points.

Is there an easier way to use Matplotlib to produce a histogram from my pre-counted data?

Alternatively, if it's easiest to just bar-plot data that's been pre-binned, is there a convenience method to "roll-up" my per-item counts into binned counts?

Answer

tacaswell picture tacaswell · Oct 6, 2013

You can use the weights keyword argument to np.histgram (which plt.hist calls underneath)

val, weight = zip(*[(k, v) for k,v in counted_data.items()])
plt.hist(val, weights=weight)

Assuming you only have integers as the keys, you can also use bar directly:

min_bin = np.min(counted_data.keys())
max_bin = np.max(counted_data.keys())

bins = np.arange(min_bin, max_bin + 1)
vals = np.zeros(max_bin - min_bin + 1)

for k,v in counted_data.items():
    vals[k - min_bin] = v

plt.bar(bins, vals, ...)

where ... is what ever arguments you want to pass to bar (doc)

If you want to re-bin your data see Histogram with separate list denoting frequency