sklearn mask for onehotencoder does not work

Question 1

sklearn mask for onehotencoder does not work

python numpy scikit-learn transformation one-hot-encoding

PascalVKooten · Dec 4, 2015 · Viewed 8.2k times · Source

Answer

Answer

I think there's some confusion here. You still need to enter the numerical values, but within the encoder you can specify which values are categorical which are not.

The input to this transformer should be a matrix of integers, denoting the values taken on by categorical (discrete) features.

So in the example below I change aaa to 5 and bbb to 6. This way it will distinguish from the 1 and 2 numerical values:

d = np.array([[5, 1, 1], [6, 2, 2]])
ohe = OneHotEncoder(categorical_features=np.array([True,False,False], dtype=bool))
ohe.fit(d)

Now you can check your feature categories:

ohe.active_features_
Out[22]: array([5, 6], dtype=int64)

Question 2

Considering data like:

from sklearn.preprocessing import OneHotEncoder
import numpy as np
dt = 'object, i4, i4'
d = np.array([('aaa', 1, 1), ('bbb', 2, 2)], dtype=dt)

I want to exclude the text column using the OHE functionality.

Why does the following not work?

ohe = OneHotEncoder(categorical_features=np.array([False,True,True], dtype=bool))       
ohe.fit(d)
ValueError: could not convert string to float: 'bbb'

It says in the documentation:

categorical_features: “all” or array of indices or mask :
  Specify what features are treated as categorical.
   ‘all’ (default): All features are treated as categorical.
   array of indices: Array of categorical feature indices.
   mask: Array of length n_features and with dtype=bool.

I'm using a mask, yet it still tries to convert to float.

Even using

ohe = OneHotEncoder(categorical_features=np.array([False,True,True], dtype=bool), 
                    dtype=dt)        
ohe.fit(d)

Same error.

And also in the case of "array of indices":

ohe = OneHotEncoder(categorical_features=np.array([1, 2]), dtype=dt)        
ohe.fit(d)

sklearn mask for onehotencoder does not work

Answer

Related questions