Display weighted mean by group in the data.frame

Elixterra picture Elixterra · Jul 21, 2016 · Viewed 14.3k times · Source

Issues regarding the command by and weighted.mean already exist but none was able to help solving my problem. I am new to R and am more used to data mining language than programming.

I have a data frame with for each individual (observation/row) the income, education level and sample weight. I want to calculate the weighted mean of income by education level, and I want the result to be associated to each individual in a new column of my original data frame, like this:

obs income education weight incomegroup
1.   1000      A       10    --> display weighted mean of income for education level A
2.   2000      B        1    --> display weighted mean of income for education level B
3.   1500      B        5    --> display weighted mean of income for education level B
4.   2000      A        2    --> display weighted mean of income for education level A

I tried:

data$incomegroup=by(data$education, function(x) weighted.mean(data$income, data$weight))    

It does not work. The weighted mean is calculated somehow and appears in column "incomegroup" but for the whole set instead of by group or for one group only, I don't know. I read things regarding packages plyr or aggregate but it does not seem to do what I am interested in.

The ave{stats} command gives exactly what I am looking for but only for simple mean:

data$incomegroup=ave(data$income,data$education,FUN = mean)

It cannot be used with weights.

Thanking you in advance for your help!

Answer

akrun picture akrun · Jul 21, 2016

If we use mutate, then we can avoid the left_join

library(dplyr)
df %>%
   group_by(education) %>% 
   mutate(weighted_income = weighted.mean(income, weight))
#    obs income education weight weighted_income
#  <int>  <int>    <fctr>  <int>           <dbl>
#1     1   1000         A     10        1166.667
#2     2   2000         B      1        1583.333
#3     3   1500         B      5        1583.333
#4     4   2000         A      2        1166.667