Plotting quantiles, median and spread using scipy and matplotlib

Sahil M picture Sahil M · Aug 19, 2013 · Viewed 15.3k times · Source

I am new to matplotlib, and I want to create a plot, with the following information:

  1. A line joining the medians of around 200 variable length vectors (input)
  2. A line joining the corresponding quantiles of these vectors.
  3. A line joining the corresponding spread (largest and smallest points).

So basically, its somewhat like a continuous box plot.

Thanks!

Answer

Viktor Kerkez picture Viktor Kerkez · Aug 19, 2013

Using just scipy and matplotlib (you tagged only those libraries in your question) is a little bit verbose, but here's how you would do it (I'm doing it only for the quantiles):

import numpy as np
from scipy.stats import mstats
import matplotlib.pyplot as plt

# Create 10 columns with 100 rows of random data
rd = np.random.randn(100, 10)
# Calculate the quantiles column wise
quantiles = mstats.mquantiles(rd, axis=0)
# Plot it
labels = ['25%', '50%', '75%']
for i, q in enumerate(quantiles):
    plt.plot(q, label=labels[i])
plt.legend()

Which gives you:

enter image description here

Now, I would try to convince you to try the Pandas library :)

import numpy as np
import pandas as pd
# Create random data
rd = pd.DataFrame(np.random.randn(100, 10))
# Calculate all the desired values
df = pd.DataFrame({'mean': rd.mean(), 'median': rd.median(),
                   '25%': rd.quantile(0.25), '50%': rd.quantile(0.5),
                   '75%': rd.quantile(0.75)})
# And plot it
df.plot()

You'll get:

enter image description here

Or you can get all the stats in just one line:

rd.describe().T.drop('count', axis=1).plot()

enter image description here

Note: I dropped the count since it's not a part of the "5 number summary".