Non-standard evaluation (NSE) in dplyr's filter_ & pulling data from MySQL

Lorenzo Rossi picture Lorenzo Rossi · Oct 21, 2014 · Viewed 11.5k times · Source

I'd like to pull some data from a sql server with a dynamic filter. I'm using the great R package dplyr in the following way:

#Create the filter
filter_criteria = ~ column1 %in% some_vector
#Connect to the database
connection <- src_mysql(dbname <- "mydbname", 
             user <- "myusername", 
             password <- "mypwd", 
             host <- "myhost") 
#Get data
data <- connection %>%
 tbl("mytable") %>% #Specify which table
 filter_(.dots = filter_criteria) %>% #non standard evaluation filter
 collect() #Pull data

This piece of code works fine but now I'd like to loop it somehow on all the columns of my table, thus I'd like to write the filter as:

#Dynamic filter
i <- 2 #With a loop on this i for instance
which_column <- paste0("column",i)
filter_criteria <- ~ which_column %in% some_vector

And then reapply the first code with the updated filter.

Unfortunately this approach doesn't give the expected results. In fact it does not give any error but doesn't even pull any result into R. In particular, I looked a bit into the SQL query generated by the two pieces of code and there is one important difference.

While the first, working, code generates a query of the form:

SELECT ... FROM ... WHERE 
`column1` IN ....

(` sign in the column name), the second one generates a query of the form:

SELECT ... FROM ... WHERE 
'column1' IN ....

(' sign in the column name)

Does anyone have any suggestion on how to formulate the filtering condition to make it work?

Answer

Matthew picture Matthew · Oct 22, 2014

It's not really related to SQL. This example in R does not work either:

df <- data.frame(
     v1 = sample(5, 10, replace = TRUE),
     v2 = sample(5,10, replace = TRUE)
)
df %>% filter_(~ "v1" == 1)

It does not work because you need to pass to filter_ the expression ~ v1 == 1 — not the expression ~ "v1" == 1.

To solve the problem, simply use the quoting operator quo and the dequoting operator !!

library(dplyr)
which_column = quot(v1)
df %>% filter(!!which_column == 1)