How to know the labels assigned by astype('category').cat.codes?

Marisa picture Marisa · Jun 29, 2018 · Viewed 18.1k times · Source

I have the following dataframe called language

         lang          level
0      english         intermediate
1      spanish         intermediate
2      spanish         basic
3      english         basic
4      english         advanced
5      spanish         intermediate
6      spanish         basic
7      spanish         advanced

I categorized each of my variables into numbers by using

language.lang.astype('category').cat.codes

and

language.level.astype('category').cat.codes

respectively. Obtaining the following data frame:

      lang   level
0      0       1
1      1       1
2      1       0
3      0       0
4      0       2
5      1       1
6      1       0
7      1       2

Now, I would like to know if there is a way to obtain which original value corresponds to each value. I'd like to know that the 0 value in the lang column corresponds to english and so on.

Is there any function that allows me to get back this information?

Answer

jezrael picture jezrael · Jun 29, 2018

You can generate dictionary:

c = language.lang.astype('category')

d = dict(enumerate(c.cat.categories))
print (d)
{0: 'english', 1: 'spanish'}

So then if necessary is possible map:

language['code'] = language.lang.astype('category').cat.codes

language['level_back'] = language['code'].map(d)
print (language)
      lang         level  code level_back
0  english  intermediate     0    english
1  spanish  intermediate     1    spanish
2  spanish         basic     1    spanish
3  english         basic     0    english
4  english      advanced     0    english
5  spanish  intermediate     1    spanish
6  spanish         basic     1    spanish
7  spanish      advanced     1    spanish