How can I randomly insert np.nan
's in a DataFrame ?
Let's say I want 10% null values inside my DataFrame.
My data looks like this :
df = pd.DataFrame(np.random.randn(5, 3),
index=['a', 'b', 'c', 'd', 'e'],
columns=['one', 'two', 'three'])
one two three
a 0.695132 1.044791 -1.059536
b -1.075105 0.825776 1.899795
c -0.678980 0.051959 -0.691405
d -0.182928 1.455268 -1.032353
e 0.205094 0.714192 -0.938242
Is there an easy way to insert the null values?
Here's a way to clear exactly 10% of cells (or rather, as close to 10% as can be achieved with the existing data frame's size).
import random
ix = [(row, col) for row in range(df.shape[0]) for col in range(df.shape[1])]
for row, col in random.sample(ix, int(round(.1*len(ix)))):
df.iat[row, col] = np.nan
Here's a way to clear cells independently with a per-cell probability of 10%.
df = df.mask(np.random.random(df.shape) < .1)