Is there a pandas built-in way to apply two different aggregating functions f1, f2
to the same column df["returns"]
, without having to call agg()
multiple times?
Example dataframe:
import pandas as pd
import datetime as dt
pd.np.random.seed(0)
df = pd.DataFrame({
"date" : [dt.date(2012, x, 1) for x in range(1, 11)],
"returns" : 0.05 * np.random.randn(10),
"dummy" : np.repeat(1, 10)
})
The syntactically wrong, but intuitively right, way to do it would be:
# Assume `f1` and `f2` are defined for aggregating.
df.groupby("dummy").agg({"returns": f1, "returns": f2})
Obviously, Python doesn't allow duplicate keys. Is there any other manner for expressing the input to agg()
? Perhaps a list of tuples [(column, function)]
would work better, to allow multiple functions applied to the same column? But agg()
seems like it only accepts a dictionary.
Is there a workaround for this besides defining an auxiliary function that just applies both of the functions inside of it? (How would this work with aggregation anyway?)
You can simply pass the functions as a list:
In [20]: df.groupby("dummy").agg({"returns": [np.mean, np.sum]})
Out[20]:
mean sum
dummy
1 0.036901 0.369012
or as a dictionary:
In [21]: df.groupby('dummy').agg({'returns':
{'Mean': np.mean, 'Sum': np.sum}})
Out[21]:
returns
Mean Sum
dummy
1 0.036901 0.369012