Aggregate / summarize multiple variables per group (e.g. sum, mean)

MikeTP picture MikeTP · Mar 15, 2012 · Viewed 182k times · Source

From a data frame, is there a easy way to aggregate (sum, mean, max et c) multiple variables simultaneously?

Below are some sample data:

library(lubridate)
days = 365*2
date = seq(as.Date("2000-01-01"), length = days, by = "day")
year = year(date)
month = month(date)
x1 = cumsum(rnorm(days, 0.05)) 
x2 = cumsum(rnorm(days, 0.05))
df1 = data.frame(date, year, month, x1, x2)

I would like to simultaneously aggregate the x1 and x2 variables from the df2 data frame by year and month. The following code aggregates the x1 variable, but is it also possible to simultaneously aggregate the x2 variable?

### aggregate variables by year month
df2=aggregate(x1 ~ year+month, data=df1, sum, na.rm=TRUE)
head(df2)

Any suggestions would be greatly appreciated.

Answer

Andrie picture Andrie · Mar 15, 2012

Yes, in your formula, you can cbind the numeric variables to be aggregated:

aggregate(cbind(x1, x2) ~ year + month, data = df1, sum, na.rm = TRUE)
   year month         x1          x2
1  2000     1   7.862002   -7.469298
2  2001     1 276.758209  474.384252
3  2000     2  13.122369 -128.122613
...
23 2000    12  63.436507  449.794454
24 2001    12 999.472226  922.726589

See ?aggregate, the formula argument and the examples.