I want to calculate mean
(or any other summary statistics of length one, e.g. min
, max
, length
, sum
) of a numeric variable ("value") within each level of a grouping variable ("group").
The summary statistic should be assigned to a new variable which has the same length as the original data. That is, each row of the original data should have a value corresponding to the current group value - the data set should not be collapsed to one row per group. For example, consider group mean
:
Before
id group value
1 a 10
2 a 20
3 b 100
4 b 200
After
id group value grp.mean.values
1 a 10 15
2 a 20 15
3 b 100 150
4 b 200 150
You may do this in dplyr
using mutate
:
library(dplyr)
df %>%
group_by(group) %>%
mutate(grp.mean.values = mean(value))
...or use data.table
to assign the new column by reference (:=
):
library(data.table)
setDT(df)[ , grp.mean.values := mean(value), by = group]