Ok so I have a dataframe which contains timeseries data that has a multiline index for each columns. Here is a sample of what the data looks like and it is in csv format. Loading the data is not an issue here.
What I want to do is to be able to create a boxplot with this data grouped according to different catagories in a specific line of the multiindex. For example if I were to group by 'SPECIES' I would have the groups, 'aq', 'gr', 'mix', 'sed' and a box for each group at a specific time in the timeseries.
I've tried this:
grouped = data['2013-08-17'].groupby(axis=1, level='SPECIES')
grouped.boxplot()
but it gives me a boxplot (flat line) for each point in the group rather than for the grouped set. Is there an easy way to do this? I don't have any problems grouping as I can aggregate the groups any which way I want, but I can't get them to boxplot.
this code:
data['2013-08-17'].boxplot(by='SPECIES')
Will not work, as boxplot is a function for a DataFrame and not a Series.
While in Pandas > 0.18.1 the boxplot function has the argument columns
which defines from what column the data is taken from.
So
data.boxplot(column='2013-08-17',by='SPECIES')
should return the desired result.
An example with the Iris dataset:
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/data/iris.csv')
fig, ax = plt.subplots(figsize=(10,8))
plt.suptitle('')
data.boxplot(column=['SepalLength'], by='Name', ax=ax)
creates:
plt.suptitle('')
turns off the annoying automatic subtitle. And of course the column arguments accepts lists of columns... so
data.boxplot(column=['SepalLength', 'SepalWidth'], by='Name', ax=ax)
also works.