Time-series boxplot in pandas

Fred S picture Fred S · Oct 22, 2014 · Viewed 14.7k times · Source

How can I create a boxplot for a pandas time-series where I have a box for each day?

Sample dataset of hourly data where one box should consist of 24 values:

import pandas as pd
n = 480
ts = pd.Series(randn(n),
               index=pd.date_range(start="2014-02-01",
                                   periods=n,
                                   freq="H"))
ts.plot()

I am aware that I could make an extra column for the day, but I would like to have proper x-axis labeling and x-limit functionality (like in ts.plot()), so being able to work with the datetime index would be great.

There is a similar question for R/ggplot2 here, if it helps to clarify what I want.

Answer

Rutger Kassies picture Rutger Kassies · Oct 22, 2014

If its an option for you, i would recommend using Seaborn, which is a wrapper for Matplotlib. You could do it yourself by looping over the groups from your timeseries, but that's much more work.

import pandas as pd
import numpy as np
import seaborn
import matplotlib.pyplot as plt

n = 480
ts = pd.Series(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))


fig, ax = plt.subplots(figsize=(12,5))
seaborn.boxplot(ts.index.dayofyear, ts, ax=ax)

Which gives: enter image description here

Note that i'm passing the day of year as the grouper to seaborn, if your data spans multiple years this wouldn't work. You could then consider something like:

ts.index.to_series().apply(lambda x: x.strftime('%Y%m%d'))

Edit, for 3-hourly you could use this as a grouper, but it only works if there are no minutes or lower defined. :

[(dt - datetime.timedelta(hours=int(dt.hour % 3))).strftime('%Y%m%d%H') for dt in ts.index]