I have a pandas dataframe that looks like this:
c y
0 9 0
1 8 0
2 3 1
3 6 2
4 1 3
5 2 3
6 5 3
7 4 4
8 0 4
9 7 4
I'd like to groupby y
and get the min and max of c
so that my new dataframe would look like this:
c y min max
0 9 0 8 9
1 8 0 8 9
2 3 1 3 3
3 6 2 6 6
4 1 3 1 5
5 2 3 1 5
6 5 3 1 5
7 4 4 0 7
8 0 4 0 7
9 7 4 0 7
I tried using df['min'] = df.groupby(['y'])['c'].min()
but that gave me some weird results. The first 175 rows were populated in the min column but then it went to NaN for all the rest. is that not how you're supposed to use the groupby method?
Option 1 Use transform
In [13]: dfc = df.groupby('y')['c']
In [14]: df.assign(min=dfc.transform(min), max=dfc.transform(max))
Out[14]:
c y max min
0 9 0 9 8
1 8 0 9 8
2 3 1 3 3
3 6 2 6 6
4 1 3 5 1
5 2 3 5 1
6 5 3 5 1
7 4 4 7 0
8 0 4 7 0
9 7 4 7 0
Or
In [15]: df['min' ] = dfc.transform('min')
In [16]: df['max' ] = dfc.transform('max')
Option 2 Use join and agg
In [30]: df.join(df.groupby('y')['c'].agg(['min', 'max']), on='y')
Out[30]:
c y min max
0 9 0 8 9
1 8 0 8 9
2 3 1 3 3
3 6 2 6 6
4 1 3 1 5
5 2 3 1 5
6 5 3 1 5
7 4 4 0 7
8 0 4 0 7
9 7 4 0 7
Option 3 Use merge and agg
In [28]: df.merge(df.groupby('y')['c'].agg(['min', 'max']), right_index=True, left_on='y')
Out[28]:
c y min max
0 9 0 8 9
1 8 0 8 9
2 3 1 3 3
3 6 2 6 6
4 1 3 1 5
5 2 3 1 5
6 5 3 1 5
7 4 4 0 7
8 0 4 0 7
9 7 4 0 7