Python Pandas Create New Bin/Bucket Variable with pd.qcut

sfortney picture sfortney · Feb 10, 2015 · Viewed 16.1k times · Source

How do you create a new Bin/Bucket Variable using pd.qut in python?

This might seem elementary to experienced users but I was not super clear on this and it was surprisingly unintuitive to search for on stack overflow/google. Some thorough searching yielded this (Assignment of qcut as new column) but it didn't quite answer my question because it didn't take the last step and put everything into bins (i.e. 1,2,...).

Answer

unutbu picture unutbu · Feb 10, 2015

In Pandas 0.15.0 or newer, pd.qcut will return a Series, not a Categorical if the input is a Series (as it is, in your case) or if labels=False. If you set labels=False, then qcut will return a Series with the integer indicators of the bins as values.

So to future-proof your code, you could use

data3['bins_spd'] = pd.qcut(data3['spd_pct'], 5, labels=False)

or, pass a NumPy array to pd.qcut so you get a Categorical as the return value. Note that the Categorical attribute labels is deprecated. Use codes instead:

data3['bins_spd'] = pd.qcut(data3['spd_pct'].values, 5).codes