dplyr summarise: Equivalent of ".drop=FALSE" to keep groups with zero length in output

eipi10 picture eipi10 · Mar 20, 2014 · Viewed 35.4k times · Source

When using summarise with plyr's ddply function, empty categories are dropped by default. You can change this behavior by adding .drop = FALSE. However, this doesn't work when using summarise with dplyr. Is there another way to keep empty categories in the result?

Here's an example with fake data.

library(dplyr)

df = data.frame(a=rep(1:3,4), b=rep(1:2,6))

# Now add an extra level to df$b that has no corresponding value in df$a
df$b = factor(df$b, levels=1:3)

# Summarise with plyr, keeping categories with a count of zero
plyr::ddply(df, "b", summarise, count_a=length(a), .drop=FALSE)

  b    count_a
1 1    6
2 2    6
3 3    0

# Now try it with dplyr
df %.%
  group_by(b) %.%
  summarise(count_a=length(a), .drop=FALSE)

  b     count_a .drop
1 1     6       FALSE
2 2     6       FALSE

Not exactly what I was hoping for. Is there a dplyr method for achieving the same result as .drop=FALSE in plyr?

Answer

A5C1D2H2I1M1N2O1R2T1 picture A5C1D2H2I1M1N2O1R2T1 · Mar 18, 2016

The issue is still open, but in the meantime, especially since your data are already factored, you can use complete from "tidyr" to get what you might be looking for:

library(tidyr)
df %>%
  group_by(b) %>%
  summarise(count_a=length(a)) %>%
  complete(b)
# Source: local data frame [3 x 2]
# 
#        b count_a
#   (fctr)   (int)
# 1      1       6
# 2      2       6
# 3      3      NA

If you wanted the replacement value to be zero, you need to specify that with fill:

df %>%
  group_by(b) %>%
  summarise(count_a=length(a)) %>%
  complete(b, fill = list(count_a = 0))
# Source: local data frame [3 x 2]
# 
#        b count_a
#   (fctr)   (dbl)
# 1      1       6
# 2      2       6
# 3      3       0