pandas.get_dummies
emits a dummy variable per categorical value. Is there some automated, easy way to ask it to create only N-1 dummy variables? (just get rid of one "baseline" variable arbitrarily)?
Needed to avoid co-linearity in our dataset.
Pandas version 0.18.0 implemented exactly what you're looking for: the drop_first
option. Here's an example:
In [1]: import pandas as pd
In [2]: pd.__version__
Out[2]: u'0.18.1'
In [3]: s = pd.Series(list('abcbacb'))
In [4]: pd.get_dummies(s, drop_first=True)
Out[4]:
b c
0 0.0 0.0
1 1.0 0.0
2 0.0 1.0
3 1.0 0.0
4 0.0 0.0
5 0.0 1.0
6 1.0 0.0