I would like to use df.groupby()
in combination with apply()
to apply a function to each row per group.
I normally use the following code, which usually works (note, that this is without groupby()
):
df.apply(myFunction, args=(arg1,))
With the groupby()
I tried the following:
df.groupby('columnName').apply(myFunction, args=(arg1,))
However, I get the following error:
TypeError: myFunction() got an unexpected keyword argument 'args'
Hence, my question is: How can I use groupby()
and apply()
with a function that needs arguments?
pandas.core.groupby.GroupBy.apply
does NOT have named parameter args
, but pandas.DataFrame.apply
does have it.
So try this:
df.groupby('columnName').apply(lambda x: myFunction(x, arg1))
or as suggested by @Zero:
df.groupby('columnName').apply(myFunction, ('arg1'))
Demo:
In [82]: df = pd.DataFrame(np.random.randint(5,size=(5,3)), columns=list('abc'))
In [83]: df
Out[83]:
a b c
0 0 3 1
1 0 3 4
2 3 0 4
3 4 2 3
4 3 4 1
In [84]: def f(ser, n):
...: return ser.max() * n
...:
In [85]: df.apply(f, args=(10,))
Out[85]:
a 40
b 40
c 40
dtype: int64
when using GroupBy.apply
you can pass either a named arguments:
In [86]: df.groupby('a').apply(f, n=10)
Out[86]:
a b c
a
0 0 30 40
3 30 40 40
4 40 20 30
a tuple of arguments:
In [87]: df.groupby('a').apply(f, (10))
Out[87]:
a b c
a
0 0 30 40
3 30 40 40
4 40 20 30