Apply different functions to different items in group object: Python pandas

kunitomo picture kunitomo · Mar 7, 2013 · Viewed 9.2k times · Source

Suppose I have a dataframe as follows:

In [1]: test_dup_df

Out[1]:
                  exe_price exe_vol flag 
2008-03-13 14:41:07  84.5    200     yes
2008-03-13 14:41:37  85.0    10000   yes
2008-03-13 14:41:38  84.5    69700   yes
2008-03-13 14:41:39  84.5    1200    yes
2008-03-13 14:42:00  84.5    1000    yes
2008-03-13 14:42:08  84.5    300     yes
2008-03-13 14:42:10  84.5    88100   yes
2008-03-13 14:42:10  84.5    11900   yes
2008-03-13 14:42:15  84.5    5000    yes
2008-03-13 14:42:16  84.5    3200    yes 

I want to group a duplicate data at time 14:42:10 and apply different functions to exe_price and exe_vol (e.g., sum the exe_vol and compute volume weighted average of exe_price). I know that I can do

In [2]: grouped = test_dup_df.groupby(level=0)

to group the duplicate indices and then use the first() or last() functions to get either the first or the last row but this is not really what I want.

Is there a way to group and then apply different (written by me) functions to values in different column?

Answer

waitingkuo picture waitingkuo · Mar 7, 2013

Apply your own function:

In [12]: def func(x):
             exe_price = (x['exe_price']*x['exe_vol']).sum() / x['exe_vol'].sum()
             exe_vol = x['exe_vol'].sum()
             flag = True        
             return Series([exe_price, exe_vol, flag], index=['exe_price', 'exe_vol', 'flag'])


In [13]: test_dup_df.groupby(test_dup_df.index).apply(func)
Out[13]:
                    exe_price exe_vol  flag
date_time                                  
2008-03-13 14:41:07      84.5     200  True 
2008-03-13 14:41:37        85   10000  True
2008-03-13 14:41:38      84.5   69700  True
2008-03-13 14:41:39      84.5    1200  True
2008-03-13 14:42:00      84.5    1000  True
2008-03-13 14:42:08      84.5     300  True
2008-03-13 14:42:10     20.71  100000  True
2008-03-13 14:42:15      84.5    5000  True
2008-03-13 14:42:16      84.5    3200  True