How to create categorical variable based on a numerical variable

Klausos Klausos picture Klausos Klausos · Sep 17, 2015 · Viewed 18k times · Source

My DataFrame hase one column:

import pandas as pd
list=[1,1,4,5,6,6,30,20,80,90]
df=pd.DataFrame({'col1':list})

How can I add one more column 'col2' that would contain categorical information in reference to col1:

if col1 > 0 and col1 <= 10 then col2 = 'xxx'
if col1 > 10 and col1 <= 50 then col2 = 'yyy'
if col1 > 50 then col2 = 'zzz'

Answer

DontDivideByZero picture DontDivideByZero · Oct 11, 2017

You could use pd.cut as follows:

df['col2'] = pd.cut(df['col1'], bins=[0, 10, 50, float('Inf')], labels=['xxx', 'yyy', 'zzz'])

Output:

   col1 col2
0     1  xxx
1     1  xxx
2     4  xxx
3     5  xxx
4     6  xxx
5     6  xxx
6    30  yyy
7    20  yyy
8    80  zzz
9    90  zzz