dplyr summarize with subtotals

Kyle Ward picture Kyle Ward · Jul 1, 2015 · Viewed 10.7k times · Source

One of the great things about pivot tables in excel is that they provide subtotals automatically. First, I would like to know if there is anything already created within dplyr that can accomplish this. If not, what is the easiest way to achieve it?

In the example below, I show the mean displacement by number of cylinders and carburetors. For each group of cylinders (4,6,8), I'd like to see the mean displacement for the group (or total displacement, or any other summary statistic).

library(dplyr)
mtcars %>% group_by(cyl,carb) %>% summarize(mean(disp))

  cyl carb mean(disp)
1   4    1      91.38
2   4    2     116.60
3   6    1     241.50
4   6    4     163.80
5   6    6     145.00
6   8    2     345.50
7   8    3     275.80
8   8    4     405.50
9   8    8     301.00

Answer

Frank picture Frank · Jul 1, 2015

data.table It's very clunky, but this is one way:

library(data.table)
DT <- data.table(mtcars)
rbind(
  DT[,.(mean(disp)),          by=.(cyl,carb)],
  DT[,.(mean(disp), carb=NA), by=.(cyl) ],
  DT[,.(mean(disp), cyl=NA),  by=.(carb)]
)[order(cyl,carb)]

This gives

    cyl carb       V1
 1:   4    1  91.3800
 2:   4    2 116.6000
 3:   4   NA 105.1364
 4:   6    1 241.5000
 5:   6    4 163.8000
 6:   6    6 145.0000
 7:   6   NA 183.3143
 8:   8    2 345.5000
 9:   8    3 275.8000
10:   8    4 405.5000
11:   8    8 301.0000
12:   8   NA 353.1000
13:  NA    1 134.2714
14:  NA    2 208.1600
15:  NA    3 275.8000
16:  NA    4 308.8200
17:  NA    6 145.0000
18:  NA    8 301.0000

I'd rather see results in something like an R table, but don't know of any functions for that.


dplyr @akrun found this analogous code

bind_rows(
  mtcars %>% 
    group_by(cyl, carb) %>% 
    summarise(Mean= mean(disp)), 
  mtcars %>% 
    group_by(cyl) %>% 
    summarise(carb=NA, Mean=mean(disp)), 
  mtcars %>% 
    group_by(carb) %>% 
    summarise(cyl=NA, Mean=mean(disp))
) %>% arrange(cyl, carb)

We could wrap the repeat operations in a function

library(lazyeval)
f1 <- function(df, grp, Var, func){
  FUN <- match.fun(func)
   df %>% 
     group_by_(.dots=grp) %>%
     summarise_(interp(~FUN(v), v=as.name(Var)))
  }

 m1 <- f1(mtcars, c('carb', 'cyl'), 'disp', 'mean')
 m2 <- f1(mtcars, 'carb', 'disp', 'mean')
 m3 <- f1(mtcars, 'cyl', 'disp', 'mean')

 bind_rows(list(m1, m2, m3)) %>%
              arrange(cyl, carb) %>%
              rename(Mean=`FUN(disp)`)
   carb cyl     Mean
1     1   4  91.3800
2     2   4 116.6000
3    NA   4 105.1364
4     1   6 241.5000
5     4   6 163.8000
6     6   6 145.0000
7    NA   6 183.3143
8     2   8 345.5000
9     3   8 275.8000
10    4   8 405.5000
11    8   8 301.0000
12   NA   8 353.1000
13    1  NA 134.2714
14    2  NA 208.1600
15    3  NA 275.8000
16    4  NA 308.8200
17    6  NA 145.0000
18    8  NA 301.0000

Either option can be made a little less ugly with data.table's rbindlist with fill:

rbindlist(list(
  mtcars %>% group_by(cyl) %>% summarise(mean(disp)),
  mtcars %>% group_by(carb) %>% summarise(mean(disp)),
  mtcars %>% group_by(cyl,carb) %>% summarise(mean(disp))
),fill=TRUE) %>% arrange(cyl,carb)

rbindlist(list(
  DT[,mean(disp),by=.(cyl,carb)],
  DT[,mean(disp),by=.(cyl)],
  DT[,mean(disp),by=.(carb)]
),fill=TRUE)[order(cyl,carb)]