Add a dynamic value into RMySQL getQuery

analyticsPierce picture analyticsPierce · Nov 27, 2010 · Viewed 7.6k times · Source

Is it possible to pass a value into the query in dbGetQuery from the RMySQL package.

For example, if I have a set of values in a character vector:

df <- c('a','b','c')

And I want to loop through the values to pull out a specific value from a database for each.

library(RMySQL)    
res <- dbGetQuery(con, "SELECT max(ID) FROM table WHERE columna='df[2]'")

When I try to add the reference to the value I get an error. Wondering if it is possible to add a value from an R object in the query.

Answer

Gavin Simpson picture Gavin Simpson · Nov 27, 2010

One option is to manipulate the SQL string within the loop. At the moment you have a string literal, the 'df[2]' is not interpreted by R as anything other than characters. There are going to be some ambiguities in my answer, because df in your Q is patently not a data frame (it is a character vector!). Something like this will do what you want.

Store the output in a numeric vector:

require(RMySQL)
df <- c('a','b','c')
out <- numeric(length(df))
names(out) <- df

Now we can loop over the elements of df to execute your query three times. We can set the loop up two ways: i) with i as a number which we use to reference the elements of df and out, or ii) with i as each element of df in turn (i.e. a, then b, ...). I will show both versions below.

## Version i
for(i in seq_along(df)) {
    SQL <- paste("SELECT max(ID) FROM table WHERE columna='", df[i], "';", sep = "")
    out[i] <- dbGetQuery(con, SQL)
    dbDisconnect(con)
}

OR:

## Version ii
for(i in df) {
    SQL <- paste("SELECT max(ID) FROM table WHERE columna='", i, "';", sep = "")
    out[i] <- dbGetQuery(con, SQL)
    dbDisconnect(con)
}

Which you use will depend on personal taste. The second (ii) version requires you to set names on the output vector out that are the same as the data inside out.

Having said all that, assuming your actual SQL Query is similar to the one you post, can't you do this in a single SQL statement, using the GROUP BY clause, to group the data before computing max(ID)? Doing simple things in the data base like this will likely be much quicker. Unfortunately, I don't have a MySQL instance around to play with and my SQL-fu is weak currently, so I can't given an example of this.