I have a dataframe, df, that has some columns of type float64, while the others are of object. Due to the mixed nature, I cannot use
df.fillna('unknown') #getting error "ValueError: could not convert string to float:"
as the error happened with the columns whose type is float64 (what a misleading error message!)
so I'd wish that I could do something like
for col in df.columns[<dtype == object>]:
df[col] = df[col].fillna("unknown")
So my question is if there is any such filter expression that I can use with df.columns?
I guess alternatively, less elegantly, I could do:
for col in df.columns:
if (df[col].dtype == dtype('O')): # for object type
df[col] = df[col].fillna('')
# still puzzled, only empty string works as replacement, 'unknown' would not work for certain value leading to error of "ValueError: Error parsing datetime string "unknown" at position 0"
I also would like to know why in the above code replacing '' with 'unknown' the code would work for certain cells but failed with a cell with the error of "ValueError: Error parsing datetime string "unknown" at position 0"
Thanks a lot!
Yu
This is conciser:
# select the float columns
df_num = df.select_dtypes(include=[np.float])
# select non-numeric columns
df_num = df.select_dtypes(exclude=[np.number])