Python Pandas Conditional Sum with Groupby

Question 1

Python Pandas Conditional Sum with Groupby

python pandas pandas-groupby

AllenQ · Jun 24, 2013 · Viewed 43.5k times · Source

Answer

Answer

First groupby the key1 column:

In [11]: g = df.groupby('key1')

and then for each group take the subDataFrame where key2 equals 'one' and sum the data1 column:

In [12]: g.apply(lambda x: x[x['key2'] == 'one']['data1'].sum())
Out[12]:
key1
a       0.093391
b       1.468194
dtype: float64

To explain what's going on let's look at the 'a' group:

In [21]: a = g.get_group('a')

In [22]: a
Out[22]:
      data1     data2 key1 key2
0  0.361601  0.375297    a  one
1  0.069889  0.809772    a  two
4 -0.268210  1.250340    a  one

In [23]: a[a['key2'] == 'one']
Out[23]:
      data1     data2 key1 key2
0  0.361601  0.375297    a  one
4 -0.268210  1.250340    a  one

In [24]: a[a['key2'] == 'one']['data1']
Out[24]:
0    0.361601
4   -0.268210
Name: data1, dtype: float64

In [25]: a[a['key2'] == 'one']['data1'].sum()
Out[25]: 0.093391000000000002

It may be slightly easier/clearer to do this by restricting the dataframe to just those with key2 equals one first:

In [31]: df1 = df[df['key2'] == 'one']

In [32]: df1
Out[32]:
      data1     data2 key1 key2
0  0.361601  0.375297    a  one
2  1.468194  0.272929    b  one
4 -0.268210  1.250340    a  one

In [33]: df1.groupby('key1')['data1'].sum()
Out[33]:
key1
a       0.093391
b       1.468194
Name: data1, dtype: float64

Question 2

Using sample data:

df = pd.DataFrame({'key1' : ['a','a','b','b','a'],
               'key2' : ['one', 'two', 'one', 'two', 'one'],
               'data1' : np.random.randn(5),
               'data2' : np. random.randn(5)})

df

    data1        data2     key1  key2
0    0.361601    0.375297    a   one
1    0.069889    0.809772    a   two
2    1.468194    0.272929    b   one
3   -1.138458    0.865060    b   two
4   -0.268210    1.250340    a   one

I'm trying to figure out how to group the data by key1 and sum only the data1 values where key2 equals 'one'.

Here's what I've tried

def f(d,a,b):
    d.ix[d[a] == b, 'data1'].sum()

df.groupby(['key1']).apply(f, a = 'key2', b = 'one').reset_index()

But this gives me a dataframe with 'None' values

index   key1    0
0       a       None
1       b       None

Any ideas here? I'm looking for the Pandas equivalent of the following SQL:

SELECT Key1, SUM(CASE WHEN Key2 = 'one' then data1 else 0 end)
FROM df
GROUP BY key1

FYI - I've seen conditional sums for pandas aggregate but couldn't transform the answer provided there to work with sums rather than counts.

Thanks in advance

Python Pandas Conditional Sum with Groupby

Answer

Related questions