I am trying to implement kNN from the fancyimpute module on a dataset. I was able to implement the code for continuous variables of the datasets using the code below:
knn_impute2=KNN(k=3).complete(train[['LotArea','LotFrontage']])
It yields the desirable answer as follows: This show how the original dataset looks like and how it has changed using knn imputation
I tried to implement the same code for categorical datasets and I get error :
could not convert string to float: 'female'
Here is the code I used(I am trying to use Imputer):
from sklearn.preprocessing import Imputer
imp = Imputer(missing_values='NaN', strategy='most_frequent', axis=0)
imp.fit(df['sex'])
print(imp.transform(df['sex']))
What am I doing wrong?
Recap, I want to use knn imputation on this dataset to impute the sex columns. Below is the dataset.
The dataset i want to impute using knn imputation with k value 2
How can i do that with knnimpute or i need to write my own functions. If yes, can anyone help me. Thnks
I was able to impute the categorical variables using the steps listed below. I will gladly welcome any omissions or program that can perform such tasks automatically
Step1: Subsets the object's data types(all) into another container
Step2: Change np.NaN
into an object data type, say None
. Now, the container is made up of only objects
data types
Step3: Change the entire container into categorical datasets
Step4: Encode the data set(i am using .cat.codes)
Step5: Change back the value of encoded None
into np.NaN
Step5: Use KNN (from fancyimpute) to impute the missing values
Step6: Re-map the encoded dataset to its initial names