Renaming Column Names in Pandas Groupby function

Baktaawar picture Baktaawar · Oct 22, 2013 · Viewed 135.5k times · Source

Q1) I want to do a groupby, SQL-style aggregation and rename the output column:

Example dataset:

>>> df
    ID     Region  count
0  100       Asia      2
1  101     Europe      3
2  102         US      1
3  103     Africa      5
4  100     Russia      5
5  101  Australia      7
6  102         US      8
7  104       Asia     10
8  105     Europe     11
9  110     Africa     23

I want to group the observations of this dataset by ID and Region and summing the count for each group. So I used something like this...

>>> print(df.groupby(['ID','Region'],as_index=False).count().sum())

    ID     Region  count
0  100       Asia      2
1  100     Russia      5
2  101  Australia      7
3  101     Europe      3
4  102         US      9
5  103     Africa      5
6  104       Asia     10
7  105     Europe     11
8  110     Africa     23

On using as_index=False I am able to get "SQL-Like" output. My problem is that I am unable to rename the aggregate variable count here. So in SQL if wanted to do the above thing I would do something like this:

select ID, Region, sum(count) as Total_Numbers
from df
group by ID, Region
order by ID, Region

As we see, it's very easy for me to rename the aggregate variable count to Total_Numbers in SQL. I wanted to do the same thing in Pandas but unable to find such an option in group-by function. Can somebody help?

The second question (more of an observation) is whether...

Q2) Is it possible to directly use column names in Pandas dataframe functions without enclosing them in quotes?

I understand that the variable names are strings, so have to be inside quotes, but I see if use them outside dataframe function and as an attribute we don't require them to be inside quotes. Like df.ID.sum() etc. It's only when we use it in a DataFrame function like df.sort() or df.groupby we have to use it inside quotes. This is actually a bit of pain as in SQL or in SAS or other languages we simply use the variable name without quoting them. Any suggestion on this?

Kindly reply to both questions (Q1 is the main, Q2 more of an opinion).

Answer

Roman Pekar picture Roman Pekar · Oct 22, 2013

For the first question I think answer would be:

<your DataFrame>.rename(columns={'count':'Total_Numbers'})

or

<your DataFrame>.columns = ['ID', 'Region', 'Total_Numbers']

As for second one I'd say the answer would be no. It's possible to use it like 'df.ID' because of python datamodel:

Attribute references are translated to lookups in this dictionary, e.g., m.x is equivalent to m.dict["x"]