I'm using the apply method on a panda's DataFrame object. When my DataFrame has a single column, it appears that the applied function is being called twice. The questions are why? And, can I stop that behavior?
Code:
import pandas as pd
def mul2(x):
print ('hello')
return 2*x
df = pd.DataFrame({'a': [1,2,0.67,1.34]})
df.apply(mul2)
Output:
hello
hello
0 2.00
1 4.00
2 1.34
3 2.68
I'm printing 'hello' from within the function being applied. I know it's being applied twice because 'hello' printed twice. What's more is that if I had two columns, 'hello' prints 3 times. Even more still is when I call applied to just the column 'hello' prints 4 times.
Code:
df.a.apply(mul2)
Output:
hello
hello
hello
hello
0 2.00
1 4.00
2 1.34
3 2.68
Name: a, dtype: float64
This behavior is intended, as an optimization.
See the docs:
In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.