Boxplot with pandas groupby multiindex, for specified sublevels from multiindex

pbreach picture pbreach · Aug 28, 2013 · Viewed 46.3k times · Source

Ok so I have a dataframe which contains timeseries data that has a multiline index for each columns. Here is a sample of what the data looks like and it is in csv format. Loading the data is not an issue here.

enter image description here

What I want to do is to be able to create a boxplot with this data grouped according to different catagories in a specific line of the multiindex. For example if I were to group by 'SPECIES' I would have the groups, 'aq', 'gr', 'mix', 'sed' and a box for each group at a specific time in the timeseries.

I've tried this:

grouped = data['2013-08-17'].groupby(axis=1, level='SPECIES')
grouped.boxplot()

but it gives me a boxplot (flat line) for each point in the group rather than for the grouped set. Is there an easy way to do this? I don't have any problems grouping as I can aggregate the groups any which way I want, but I can't get them to boxplot.

Answer

schlump picture schlump · Nov 1, 2016

this code:

data['2013-08-17'].boxplot(by='SPECIES')

Will not work, as boxplot is a function for a DataFrame and not a Series.

While in Pandas > 0.18.1 the boxplot function has the argument columns which defines from what column the data is taken from.

So

data.boxplot(column='2013-08-17',by='SPECIES')

should return the desired result.

An example with the Iris dataset:

import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv('https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/data/iris.csv')
fig, ax = plt.subplots(figsize=(10,8))
plt.suptitle('')
data.boxplot(column=['SepalLength'], by='Name', ax=ax)

creates:

Boxplot iris dataset with pandas

plt.suptitle('') 

turns off the annoying automatic subtitle. And of course the column arguments accepts lists of columns... so

data.boxplot(column=['SepalLength', 'SepalWidth'], by='Name', ax=ax)

also works.