Can't access dataframe columns

drevicko picture drevicko · Aug 11, 2016 · Viewed 8k times · Source

I'm importing a dataframe from a csv file, but cannot access some of it's columns by name. What's going on?

In more concrete terms:

> import pandas

> jobNames = pandas.read_csv("job_names.csv")
> print(jobNames)

   job_id   job_name   num_judgements
0  933985        Foo              180
1  933130        Moo              175
2  933123        Goo              150
3  933094       Flue              120
4  933088        Tru              120

When I try to access the second column, I get an error:

> jobNames.job_name

AttributeError: 'DataFrame' object has no attribute 'job_name'

Strangely, I can access the job_id column thus:

> print(jobNames.job_id)

0    933985
1    933130
2    933123
3    933094
4    933088
Name: job_id, dtype: int64

Edit (to put the accepted answer in context):

It turns out that the first row of the csv file (with the column names) looks like this:

job_id, job_name, num_judgements

Note the spaces after each comma! Those spaces are retained in the column names:

> jobNames.columns[1]

' job_name'

which don't form valid python identifiers, so those columns aren't available as dataframe attributes. I can still access them dict-style:

> jobNames[' job_name']

Answer

Maxim Egorushkin picture Maxim Egorushkin · Aug 11, 2016

When using pandas.read_csv pass in skipinitialspace=True flag to remove whitespace after CSV delimiters.