drop rows with errors for pandas data coercion

Gijs picture Gijs · Jul 11, 2016 · Viewed 7.7k times · Source

I have a dataframe, for which I need to convert columns to floats and ints, that has bad rows, ie., values that are in a column that should be a float or an integer are instead string values.

If I use df.bad.astype(float), I get an error, this is expected.

If I use df.bad.astype(float, errors='coerce'), or pd.to_numeric(df.bad, errors='coerce'), bad values are replaced with np.NaN, also according to spec and reasonable.

There is also errors='ignore', another option that ignores the errors and leaves the erroring values alone.

But actually, I want to not ignore the errors, but drop the rows with bad values. How can I do this?

I can ignore the errors and do some type checking, but that's not an ideal solution, and there might be something more idiomatic to do this.

Example

test = pd.DataFrame(["3", "4", "problem"], columns=["bad"])
test.bad.astype(float) ## ValueError: could not convert string to float: 'problem'

I want something like this:

pd.to_numeric(df.bad, errors='drop')

And this returns dataframe with only the 2 good rows.

Answer

SerialDev picture SerialDev · Jul 11, 2016

Since the bad values are replaced with np.NaN would it not be simply just df.dropna() to get rid of the bad rows now?

EDIT: Since you need to not drop the initial NaNs, maybe you could use df.fillna() prior to using pd.to_numeric