How do you create a new Bin/Bucket Variable using pd.qut in python?
This might seem elementary to experienced users but I was not super clear on this and it was surprisingly unintuitive to search for on stack overflow/google. Some thorough searching yielded this (Assignment of qcut as new column) but it didn't quite answer my question because it didn't take the last step and put everything into bins (i.e. 1,2,...).
In Pandas 0.15.0 or newer, pd.qcut
will return a Series, not a Categorical if the input is a Series (as it is, in your case) or if labels=False
. If you set labels=False
, then qcut
will return a Series with the integer indicators of the bins as values.
So to future-proof your code, you could use
data3['bins_spd'] = pd.qcut(data3['spd_pct'], 5, labels=False)
or, pass a NumPy array to pd.qcut
so you get a Categorical as the return value.
Note that the Categorical attribute labels
is deprecated. Use codes
instead:
data3['bins_spd'] = pd.qcut(data3['spd_pct'].values, 5).codes