dplyr issues when using group_by(multiple variables)

Marc Tulla picture Marc Tulla · Feb 9, 2014 · Viewed 89.1k times · Source

I want to start using dplyr in place of ddply but I can't get a handle on how it works (I've read the documentation).

For example, why when I try to mutate() something does the "group_by" function not work as it's supposed to?

Looking at mtcars:

library(car)

Say I make a data.frame which is a summary of mtcars, grouped by "cyl" and "gear":

df1 <- mtcars %.%
            group_by(cyl, gear) %.%
            summarise(
                newvar = sum(wt)
            )

Then say I want to further summarise this dataframe. With ddply, it'd be straightforward, but when I try to do with with dplyr, it's not actually "grouping by":

df2 <- df1 %.%
            group_by(cyl) %.%
            mutate(
                newvar2 = newvar + 5
            )

Still yields an ungrouped output:

  cyl gear newvar newvar2
1   6    3  6.675  11.675
2   4    4 19.025  24.025
3   6    4 12.375  17.375
4   6    5  2.770   7.770
5   4    3  2.465   7.465
6   8    3 49.249  54.249
7   4    5  3.653   8.653
8   8    5  6.740  11.740

Am I doing something wrong with the syntax?


Edit:

If I were to do this with plyr and ddply:

df1 <- ddply(mtcars, .(cyl, gear), summarise, newvar = sum(wt))

and then to get the second df:

df2 <- ddply(df1, .(cyl), summarise, newvar2 = sum(newvar) + 5)

But that same approach, with sum(newvar) + 5 in the summarise() function doesn't work with dplyr...

Answer

ManneR picture ManneR · Aug 14, 2014

I had a similar problem. I found that simply detaching plyr solved it:

detach(package:plyr)    
library(dplyr)