I have a dataframe, for which I need to convert columns to floats and ints, that has bad rows, ie., values that are in a column that should be a float or an integer are instead string values.
If I use df.bad.astype(float)
, I get an error, this is expected.
If I use df.bad.astype(float, errors='coerce')
, or pd.to_numeric(df.bad, errors='coerce')
, bad values are replaced with np.NaN
, also according to spec and reasonable.
There is also errors='ignore'
, another option that ignores the errors and leaves the erroring values alone.
But actually, I want to not ignore the errors, but drop the rows with bad values. How can I do this?
I can ignore the errors and do some type checking, but that's not an ideal solution, and there might be something more idiomatic to do this.
test = pd.DataFrame(["3", "4", "problem"], columns=["bad"])
test.bad.astype(float) ## ValueError: could not convert string to float: 'problem'
I want something like this:
pd.to_numeric(df.bad, errors='drop')
And this returns dataframe with only the 2 good rows.
Since the bad values are replaced with np.NaN
would it not be simply just df.dropna()
to get rid of the bad rows now?
EDIT:
Since you need to not drop the initial NaNs, maybe you could use df.fillna()
prior to using pd.to_numeric