I have fed the following CSV file into iPython Notebook:
public = pd.read_csv("categories.csv")
public
I've also imported pandas as pd, numpy as np and matplotlib.pyplot as plt. The following data types are present (the below is a summary - there are about 100 columns)
In [36]: public.dtypes
Out[37]: parks object
playgrounds object
sports object
roading object
resident int64
children int64
I want to change 'parks', 'playgrounds', 'sports' and 'roading' to categories (they have likert scale responses in them - each column has different types of likert responses though (e.g. one has "strongly agree", "agree" etc., another has "very important", "important" etc.), leaving the remainder as int64.
I was able to create a separate dataframe - public1 - and change one of the columns to a category type using the following code:
public1 = {'parks': public.parks}
public1 = public1['parks'].astype('category')
However, when I tried to change a number at once using this code, I was unsuccessful:
public1 = {'parks': public.parks,
'playgrounds': public.parks}
public1 = public1['parks', 'playgrounds'].astype('category')
Notwithstanding this, I don't want to create a separate dataframe with just the categories columns. I would like them changed in the original dataframe.
I tried numerous ways to achieve this, then tried the code here: Pandas: change data type of columns...
public[['parks', 'playgrounds', 'sports', 'roading']] = public[['parks', 'playgrounds', 'sports', 'roading']].astype('category')
and got the following error:
NotImplementedError: > 1 ndim Categorical are not supported at this time
Is there a way to change 'parks', 'playgrounds', 'sports', 'roading' to categories (so the likert scale responses can then be analysed), leaving 'resident' and 'children' (and the 94 other columns that are string, int + floats) untouched please? Or, is there a better way to do this? If anyone has any suggestions and/or feedback I would be most grateful....am slowly going bald ripping my hair out!
Many thanks in advance.
edited to add - I am using Python 2.7.
Sometimes, you just have to use a for-loop:
for col in ['parks', 'playgrounds', 'sports', 'roading']:
public[col] = public[col].astype('category')