Python Pandas - Changing some column types to categories

gincard picture gincard · Mar 7, 2015 · Viewed 82.1k times · Source

I have fed the following CSV file into iPython Notebook:

public = pd.read_csv("categories.csv")
public

I've also imported pandas as pd, numpy as np and matplotlib.pyplot as plt. The following data types are present (the below is a summary - there are about 100 columns)

In [36]:   public.dtypes
Out[37]:   parks          object
           playgrounds    object
           sports         object
           roading        object               
           resident       int64
           children       int64

I want to change 'parks', 'playgrounds', 'sports' and 'roading' to categories (they have likert scale responses in them - each column has different types of likert responses though (e.g. one has "strongly agree", "agree" etc., another has "very important", "important" etc.), leaving the remainder as int64.

I was able to create a separate dataframe - public1 - and change one of the columns to a category type using the following code:

public1 = {'parks': public.parks}
public1 = public1['parks'].astype('category')

However, when I tried to change a number at once using this code, I was unsuccessful:

public1 = {'parks': public.parks,
           'playgrounds': public.parks}
public1 = public1['parks', 'playgrounds'].astype('category')

Notwithstanding this, I don't want to create a separate dataframe with just the categories columns. I would like them changed in the original dataframe.

I tried numerous ways to achieve this, then tried the code here: Pandas: change data type of columns...

public[['parks', 'playgrounds', 'sports', 'roading']] = public[['parks', 'playgrounds', 'sports', 'roading']].astype('category')

and got the following error:

 NotImplementedError: > 1 ndim Categorical are not supported at this time

Is there a way to change 'parks', 'playgrounds', 'sports', 'roading' to categories (so the likert scale responses can then be analysed), leaving 'resident' and 'children' (and the 94 other columns that are string, int + floats) untouched please? Or, is there a better way to do this? If anyone has any suggestions and/or feedback I would be most grateful....am slowly going bald ripping my hair out!

Many thanks in advance.

edited to add - I am using Python 2.7.

Answer

unutbu picture unutbu · Mar 7, 2015

Sometimes, you just have to use a for-loop:

for col in ['parks', 'playgrounds', 'sports', 'roading']:
    public[col] = public[col].astype('category')