How can I pass column names to dplyr if I do not know the column name, but want to specify it through a variable?
e.g. this works:
require(dplyr)
df <- as.data.frame(matrix(seq(1:9),ncol=3,nrow=3))
df$group <- c("A","B","A")
gdf <- df %.% group_by(group) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))
But this does not
require(dplyr)
someColumn = "group"
df <- as.data.frame(matrix(seq(1:9),ncol=3,nrow=3))
df$group <- c("A","B","A")
gdf <- df %.% group_by(someColumn) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))
I just gave a similar answer over at Group by multiple columns in dplyr, using string vector input, but for good measure: functions that allow you to operate on columns using strings have been added to dplyr
. These have the same name as the regular dplyr
functions, but end in an underscore. The functions are described in detail in this vignette.
Given df
and someColumn
from the OP, this now works a treat:
gdf <- df %>% group_by_(someColumn) %>% summarise(m1=mean(V1),m2=mean(V2),m3=mean(V3))
Note that it is group_by_
, rather than group_by
, and the %>%
operator is used as %.%
is deprecated.