Pandas Fillna of Multiple Columns with Mode of Each Column

Question 1

Pandas Fillna of Multiple Columns with Mode of Each Column

python pandas numpy data-science

Nick · Mar 18, 2017 · Viewed 9.7k times · Source

Answer

Answer

If you want to impute missing values with the mode in some columns a dataframe df, you can just fillna by Series created by select by position by iloc:

cols = ["workclass", "native-country"]
df[cols]=df[cols].fillna(df.mode().iloc[0])

Or:

df[cols]=df[cols].fillna(mode.iloc[0])

Your solution:

df[cols]=df.filter(cols).fillna(mode.iloc[0])

Sample:

df = pd.DataFrame({'workclass':['Private','Private',np.nan, 'another', np.nan],
                   'native-country':['United-States',np.nan,'Canada',np.nan,'United-States'],
                   'col':[2,3,7,8,9]})

print (df)
   col native-country workclass
0    2  United-States   Private
1    3            NaN   Private
2    7         Canada       NaN
3    8            NaN   another
4    9  United-States       NaN

mode = df.filter(["workclass", "native-country"]).mode()
print (mode)
  workclass native-country
0   Private  United-States

cols = ["workclass", "native-country"]
df[cols]=df[cols].fillna(df.mode().iloc[0])
print (df)
   col native-country workclass
0    2  United-States   Private
1    3  United-States   Private
2    7         Canada   Private
3    8  United-States   another
4    9  United-States   Private

Question 2

Working with census data, I want to replace NaNs in two columns ("workclass" and "native-country") with the respective modes of those two columns. I can get the modes easily:

mode = df.filter(["workclass", "native-country"]).mode()

which returns a dataframe:

  workclass native-country
0   Private  United-States

However,

df.filter(["workclass", "native-country"]).fillna(mode)

does not replace the NaNs in each column with anything, let alone the mode corresponding to that column. Is there a smooth way to do this?

Pandas Fillna of Multiple Columns with Mode of Each Column

Answer

Related questions