Pandas apply but only for rows where a condition is met

mgoldwasser picture mgoldwasser · Nov 18, 2015 · Viewed 80.6k times · Source

I would like to use Pandas df.apply but only for certain rows

As an example, I want to do something like this, but my actual issue is a little more complicated:

import pandas as pd
import math
z = pd.DataFrame({'a':[4.0,5.0,6.0,7.0,8.0],'b':[6.0,0,5.0,0,1.0]})
z.where(z['b'] != 0, z['a'] / z['b'].apply(lambda l: math.log(l)), 0)

What I want in this example is the value in 'a' divided by the log of the value in 'b' for each row, and for rows where 'b' is 0, I simply want to return 0.

Answer

jakevdp picture jakevdp · Nov 18, 2015

The other answers are excellent, but I thought I'd add one other approach that can be faster in some circumstances – using broadcasting and masking to achieve the same result:

import numpy as np

mask = (z['b'] != 0)
z_valid = z[mask]

z['c'] = 0
z.loc[mask, 'c'] = z_valid['a'] / np.log(z_valid['b'])

Especially with very large dataframes, this approach will generally be faster than solutions based on apply().