Calculate group mean (or other summary stats) and assign to original data

Mike picture Mike · May 19, 2011 · Viewed 17.5k times · Source

I want to calculate mean (or any other summary statistics of length one, e.g. min, max, length, sum) of a numeric variable ("value") within each level of a grouping variable ("group").

The summary statistic should be assigned to a new variable which has the same length as the original data. That is, each row of the original data should have a value corresponding to the current group value - the data set should not be collapsed to one row per group. For example, consider group mean:

Before

id  group  value
1   a      10
2   a      20
3   b      100
4   b      200

After

id  group  value  grp.mean.values
1   a      10     15
2   a      20     15
3   b      100    150
4   b      200    150

Answer

Henrik picture Henrik · Feb 23, 2016

You may do this in dplyr using mutate:

library(dplyr)
df %>%
  group_by(group) %>%
  mutate(grp.mean.values = mean(value))

...or use data.table to assign the new column by reference (:=):

library(data.table)
setDT(df)[ , grp.mean.values := mean(value), by = group]