pandas groupby aggregate with grand total in the bottom

Question 1

pandas groupby aggregate with grand total in the bottom

pandas group-by aggregate subtotal

Jason · Oct 27, 2015 · Viewed 12.2k times · Source

Answer

Answer

To just drop the na records, you can use the .dropna() dataframe method.

df['buc1'] = df['buc1'].astype(object) 
result = df.groupby(['buc1']).agg(aggFunc).dropna()
result


         A     B
buc1            
(0, 2]   3   6.5
(2, 6]  12  12.0

As for the marginal totals, in theory this should work:

result_alternative = pd.pivot_table(df,index='buc1',
                                    aggfunc=aggFunc,
                                    values=['A','B'],
                                    margins=True)
result_alternative

         A     B
buc1            
(0, 2]   3   6.5
(2, 6]  12  12.0
All     15   9.8

But it throws an error related to , which I think is a bug that should be fixed so it may work on your machine (although I am running version 0.17)

That same link has a workaround related to the categorical variable--just cast it as an object first, and the above call should be good.

EDIT:

If you want to stay with the groupby semantics, you can add a row total like so:

rowtotal = result.sum()
rowtotal.name = 'rowtotal'
result.append(rowtotal)

Question 2

Here is my code:

import StringIO
from pandas import *
import numpy as np

df = read_csv(StringIO.StringIO('''Col1 Col2    A   B
A   D   1   6
A   E   2   7
B   D   3   8
B   E   4   9
C   D   5   19'''), delimiter='\t')


df['buc1'] = cut(df['A'], bins = [0, 2, 6, 8])

aggFunc = {'A': sum,
           'B': np.average
           }

After running:

df.groupby(['buc1']).agg(aggFunc)

I get:

         A     B
buc1            
(0, 2]   3   6.5
(2, 6]  12  12.0
(6, 8] NaN   NaN

My questions are:

How do I get rid of the bottom (6, 8] bucket, given that there are no values there?
How do I add a grand total row at the bottom, as in a pivot table in Excel?

For instance:

buc1    A   B
(0, 2]  3   6.5
(2, 6]  12  12
Total   15  9.8

Note that the total row for the second column would be the average, not the sum.

pandas groupby aggregate with grand total in the bottom

Answer

Related questions