I'm importing a dataframe from a csv file, but cannot access some of it's columns by name. What's going on?
In more concrete terms:
> import pandas
> jobNames = pandas.read_csv("job_names.csv")
> print(jobNames)
job_id job_name num_judgements
0 933985 Foo 180
1 933130 Moo 175
2 933123 Goo 150
3 933094 Flue 120
4 933088 Tru 120
When I try to access the second column, I get an error:
> jobNames.job_name
AttributeError: 'DataFrame' object has no attribute 'job_name'
Strangely, I can access the job_id column thus:
> print(jobNames.job_id)
0 933985
1 933130
2 933123
3 933094
4 933088
Name: job_id, dtype: int64
Edit (to put the accepted answer in context):
It turns out that the first row of the csv file (with the column names) looks like this:
job_id, job_name, num_judgements
Note the spaces after each comma! Those spaces are retained in the column names:
> jobNames.columns[1]
' job_name'
which don't form valid python identifiers, so those columns aren't available as dataframe attributes. I can still access them dict-style:
> jobNames[' job_name']
When using pandas.read_csv
pass in skipinitialspace=True
flag to remove whitespace after CSV delimiters.