Here is my code:
import StringIO
from pandas import *
import numpy as np
df = read_csv(StringIO.StringIO('''Col1 Col2 A B
A D 1 6
A E 2 7
B D 3 8
B E 4 9
C D 5 19'''), delimiter='\t')
df['buc1'] = cut(df['A'], bins = [0, 2, 6, 8])
aggFunc = {'A': sum,
'B': np.average
}
After running:
df.groupby(['buc1']).agg(aggFunc)
I get:
A B
buc1
(0, 2] 3 6.5
(2, 6] 12 12.0
(6, 8] NaN NaN
My questions are:
For instance:
buc1 A B
(0, 2] 3 6.5
(2, 6] 12 12
Total 15 9.8
Note that the total row for the second column would be the average, not the sum.
To just drop the na
records, you can use the .dropna()
dataframe method.
df['buc1'] = df['buc1'].astype(object)
result = df.groupby(['buc1']).agg(aggFunc).dropna()
result
A B
buc1
(0, 2] 3 6.5
(2, 6] 12 12.0
As for the marginal totals, in theory this should work:
result_alternative = pd.pivot_table(df,index='buc1',
aggfunc=aggFunc,
values=['A','B'],
margins=True)
result_alternative
A B
buc1
(0, 2] 3 6.5
(2, 6] 12 12.0
All 15 9.8
But it throws an error related to , which I think is a bug that should be fixed so it may work on your machine (although I am running version 0.17)
That same link has a workaround related to the categorical variable--just cast it as an object first, and the above call should be good.
EDIT:
If you want to stay with the groupby semantics, you can add a row total like so:
rowtotal = result.sum()
rowtotal.name = 'rowtotal'
result.append(rowtotal)