I have a pandas dataframe similar to this:
Col1 ABC
0 XYZ A
1 XYZ B
2 XYZ C
By using the pandas get_dummies()
function on column ABC, I can get this:
Col1 A B C
0 XYZ 1 0 0
1 XYZ 0 1 0
2 XYZ 0 0 1
While I need something like this, where the ABC column has a list / array
datatype:
Col1 ABC
0 XYZ [1,0,0]
1 XYZ [0,1,0]
2 XYZ [0,0,1]
I tried using the get_dummies
function and then combining all the columns into the column which I wanted. I found lot of answers explaining how to combine multiple columns as strings, like this: Combine two columns of text in dataframe in pandas/python. But I cannot figure out a way to combine them as a list.
This question introduced the idea of using sklearn's OneHotEncoder
, but I couldn't get it to work. How do I one-hot encode one column of a pandas dataframe?
One more thing: All the answers I came across had solutions where the column names had to be manually typed while combining them. Is there a way to use Dataframe.iloc()
or splicing mechanism to combine columns into a list?
Here is an example of using sklearn.preprocessing.LabelBinarizer:
In [361]: from sklearn.preprocessing import LabelBinarizer
In [362]: lb = LabelBinarizer()
In [363]: df['new'] = lb.fit_transform(df['ABC']).tolist()
In [364]: df
Out[364]:
Col1 ABC new
0 XYZ A [1, 0, 0]
1 XYZ B [0, 1, 0]
2 XYZ C [0, 0, 1]
Pandas alternative:
In [370]: df['new'] = df['ABC'].str.get_dummies().values.tolist()
In [371]: df
Out[371]:
Col1 ABC new
0 XYZ A [1, 0, 0]
1 XYZ B [0, 1, 0]
2 XYZ C [0, 0, 1]