Want to output a Pandas groupby dataframe to CSV. Tried various StackOverflow solutions but they have not worked.
Python 3.6.1, Pandas 0.20.1
groupby result looks like:
id month year count
week
0 9066 82 32142 895
1 7679 84 30112 749
2 8368 126 42187 872
3 11038 102 34165 976
4 8815 117 34122 767
5 10979 163 50225 1252
6 8726 142 38159 996
7 5568 63 26143 582
Want a csv that looks like
week count
0 895
1 749
2 872
3 976
4 767
5 1252
6 996
7 582
Current code:
week_grouped = df.groupby('week')
week_grouped.sum() #At this point you have the groupby result
week_grouped.to_csv('week_grouped.csv') #Can't do this - .to_csv is not a df function.
Read SO solutions:
output groupby to csv file pandas
week_grouped.drop_duplicates().to_csv('week_grouped.csv')
Result: AttributeError: Cannot access callable attribute 'drop_duplicates' of 'DataFrameGroupBy' objects, try using the 'apply' method
Python pandas - writing groupby output to file
week_grouped.reset_index().to_csv('week_grouped.csv')
Result: AttributeError: "Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method"
Try doing this:
week_grouped = df.groupby('week')
week_grouped.sum().reset_index().to_csv('week_grouped.csv')
That'll write the entire dataframe to the file. If you only want those two columns then,
week_grouped = df.groupby('week')
week_grouped.sum().reset_index()[['week', 'count']].to_csv('week_grouped.csv')
Here's a line by line explanation of the original code:
# This creates a "groupby" object (not a dataframe object)
# and you store it in the week_grouped variable.
week_grouped = df.groupby('week')
# This instructs pandas to sum up all the numeric type columns in each
# group. This returns a dataframe where each row is the sum of the
# group's numeric columns. You're not storing this dataframe in your
# example.
week_grouped.sum()
# Here you're calling the to_csv method on a groupby object... but
# that object type doesn't have that method. Dataframes have that method.
# So we should store the previous line's result (a dataframe) into a variable
# and then call its to_csv method.
week_grouped.to_csv('week_grouped.csv')
# Like this:
summed_weeks = week_grouped.sum()
summed_weeks.to_csv('...')
# Or with less typing simply
week_grouped.sum().to_csv('...')