I'm trying to create a new column which returns the mean of values from an existing column in the same df. However the mean should be computed based on a grouping in three other columns.
Out[184]:
YEAR daytype hourtype scenario option_value
0 2015 SAT of_h 0 0.134499
1 2015 SUN of_h 1 63.019250
2 2015 WD of_h 2 52.113516
3 2015 WD pk_h 3 43.126513
4 2015 SAT of_h 4 56.431392
I basically would like to have a new column 'mean' which compute the mean of "option value", when "YEAR", "daytype", and "hourtype" are similar.
I tried the following approach but without success ...
In [185]: o2['premium']=o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_cf'].mean()
TypeError: incompatible index of inserted column with frame index
Here's one way to do it
In [19]: def cust_mean(grp):
....: grp['mean'] = grp['option_value'].mean()
....: return grp
....:
In [20]: o2.groupby(['YEAR', 'daytype', 'hourtype']).apply(cust_mean)
Out[20]:
YEAR daytype hourtype scenario option_value mean
0 2015 SAT of_h 0 0.134499 28.282946
1 2015 SUN of_h 1 63.019250 63.019250
2 2015 WD of_h 2 52.113516 52.113516
3 2015 WD pk_h 3 43.126513 43.126513
4 2015 SAT of_h 4 56.431392 28.282946
So, what was going wrong with your attempt?
It returns an aggregate with different shape from the original dataframe.
In [21]: o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_value'].mean()
Out[21]:
YEAR daytype hourtype
2015 SAT of_h 28.282946
SUN of_h 63.019250
WD of_h 52.113516
pk_h 43.126513
Name: option_value, dtype: float64
Or use transform
In [1461]: o2['premium'] = (o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_value']
.transform('mean'))
In [1462]: o2
Out[1462]:
YEAR daytype hourtype scenario option_value premium
0 2015 SAT of_h 0 0.134499 28.282946
1 2015 SUN of_h 1 63.019250 63.019250
2 2015 WD of_h 2 52.113516 52.113516
3 2015 WD pk_h 3 43.126513 43.126513
4 2015 SAT of_h 4 56.431392 28.282946