Imputer on some Dataframe columns in Python

Mauro Gentile picture Mauro Gentile · Jul 26, 2016 · Viewed 18.7k times · Source

I am learning how to use Imputer on Python.

This is my code:

df=pd.DataFrame([["XXL", 8, "black", "class 1", 22],
["L", np.nan, "gray", "class 2", 20],
["XL", 10, "blue", "class 2", 19],
["M", np.nan, "orange", "class 1", 17],
["M", 11, "green", "class 3", np.nan],
["M", 7, "red", "class 1", 22]])

df.columns=["size", "price", "color", "class", "boh"]

from sklearn.preprocessing import Imputer

imp=Imputer(missing_values="NaN", strategy="mean" )
imp.fit(df["price"])

df["price"]=imp.transform(df["price"])

However this rises the following error: ValueError: Length of values does not match length of index

What's wrong with my code???

Thanks for helping

Answer

frist picture frist · Jul 26, 2016

This is because Imputer usually uses with DataFrames rather than Series. A possible solution is:

imp=Imputer(missing_values="NaN", strategy="mean" )
imp.fit(df[["price"]])
df["price"]=imp.transform(df[["price"]]).ravel()

# Or even 
imp=Imputer(missing_values="NaN", strategy="mean" )
df["price"]=imp.fit_transform(df[["price"]]).ravel()