Pass arguments to dplyr functions

asnr picture asnr · Jan 16, 2015 · Viewed 30k times · Source

I want to parameterise the following computation using dplyr that finds which values of Sepal.Length are associated with more than one value of Sepal.Width:

library(dplyr)

iris %>%
    group_by(Sepal.Length) %>%
    summarise(n.uniq=n_distinct(Sepal.Width)) %>%
    filter(n.uniq > 1)

Normally I would write something like this:

not.uniq.per.group <- function(data, group.var, uniq.var) {
    iris %>%
        group_by(group.var) %>%
        summarise(n.uniq=n_distinct(uniq.var)) %>%
        filter(n.uniq > 1)
}

However, this approach throws errors because dplyr uses non-standard evaluation. How should this function be written?

Answer

asnr picture asnr · Jan 16, 2015

You need to use the standard evaluation versions of the dplyr functions (just append '_' to the function names, ie. group_by_ & summarise_) and pass strings to your function, which you then need to turn into symbols. To parameterise the argument of summarise_, you will need to use interp(), which is defined in the lazyeval package. Concretely:

library(dplyr)
library(lazyeval)

not.uniq.per.group <- function(df, grp.var, uniq.var) {
    df %>%
        group_by_(grp.var) %>%
        summarise_( n_uniq=interp(~n_distinct(v), v=as.name(uniq.var)) ) %>%
        filter(n_uniq > 1)
}

not.uniq.per.group(iris, "Sepal.Length", "Sepal.Width")

Note that in recent versions of dplyr the standard evaluation versions of the dplyr functions have been "soft deprecated" in favor of non-standard evaluation.

See the Programming with dplyr vignette for more information on working with non-standard evaluation.