I have create a simple code to implement OneHotEncoder
.
from sklearn.preprocessing import OneHotEncoder
X = [[0, 'a'], [0, 'b'], [1, 'a'], [2, 'b']]
onehotencoder = OneHotEncoder(categories=[0])
X = onehotencoder.fit_transform(X).toarray()
I just want to use method called fit_transform
to the X
for index 0
, so it means for [0, 0, 1, 2]
like what you see in X
. But it causes an error like this :
ValueError: Shape mismatch: if categories is an array, it has to be of shape (n_features,).
Anyone can solve this problem ? I am stuck on it
You need to use ColumnTransformer
to specify the column index not categories
parameter.
Constructor parameter categories
is to tell distinct category values explicitly. E.g. you could provide [0, 1, 2]
explicitly, but auto
will determine it. Further, you can use slice()
object instead.
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
X = [[0, 'a'], [0, 'b'], [1, 'a'], [2, 'b']]
ct = ColumnTransformer(
[('one_hot_encoder', OneHotEncoder(categories='auto'), [0])], # The column numbers to be transformed (here is [0] but can be [0, 1, 3])
remainder='passthrough' # Leave the rest of the columns untouched
)
X = ct.fit_transform(X)