pandas Categorical error: "Cannot setitem on a Categorical with a new category, set the categories first"

Gabriela M picture Gabriela M · Mar 9, 2018 · Viewed 10.4k times · Source

I have the following df data frame in pandas:

    weekday  venta_total_cy
0   Viernes    5.430211e+09
1     Lunes    3.425554e+09
2     Sabado    6.833202e+09
3   Domingo    6.566466e+09
4    Jueves    2.748710e+09
5    Martes    3.328418e+09
6  Miercoles    3.136277e+09

What I want to do is to order the data frame by the following days' order:

weekday
Lunes
Martes
Miercoles
Jueves
Viernes
Sabado
Domingo

To do so, I used the following code:

df['weekday'] = pd.Categorical(df[['weekday']], categories=["Lunes", "Martes", "Miercoles", "Jueves", "Viernes", "Sabado", "Domingo"])

When I run the code, I get this error:

ValueError: Cannot setitem on a Categorical with a new category, set the categories first

I have not found enough documentation to resolve this. Can you help me? Thanks!

Answer

cs95 picture cs95 · Mar 9, 2018

df[['weekday']] returns a dataframe, which is incorrect. Convert the series column to categorical instead. Also, use the ordered=True argument to establish order in your categorical column.

categories = np.array(
     ['Lunes', 'Martes', 'Miercoles', 'Jueves', 'Viernes', 'Sabado', 'Domingo'])

df['weekday'] = pd.Categorical(
   df['weekday'], categories=categories, ordered=True)
df.sort_values(by='weekday')

     weekday  venta_total_cy
1      Lunes    3.425554e+09
5     Martes    3.328418e+09
6  Miercoles    3.136277e+09
4     Jueves    2.748710e+09
0    Viernes    5.430211e+09
2     Sabado    6.833202e+09
3    Domingo    6.566466e+09