I have a pandas data frame where there are a several missing values. I noticed that the non missing values are close to each other. Thus, I would like to impute the missing values by randomly choosing the non missing values.
For instance:
import pandas as pd
import random
import numpy as np
foo = pd.DataFrame({'A': [2, 3, np.nan, 5, np.nan], 'B':[np.nan, 4, 2, np.nan, 5]})
foo
A B
0 2 NaN
1 3 4
2 NaN 2
3 5 NaN
4 NaN 5
I would like for instance foo['A'][2]=2
and foo['A'][5]=3
The shape of my pandas DataFrame is (6940,154).
I try this
foo['A'] = foo['A'].fillna(random.choice(foo['A'].values.tolist()))
But it not working. Could you help me achieve that? Best regards.
You can use pandas.fillna method and the random.choice method to fill the missing values with a random selection of a particular column.
import random
import numpy as np
df["column"].fillna(lambda x: random.choice(df[df[column] != np.nan]["column"]), inplace =True)
Where column is the column you want to fill with non nan values randomly.