I'm trying to write a function to accept a data.frame (x
) and a column
from it. The function performs some calculations on x and later returns another data.frame. I'm stuck on the best-practices method to pass the column name to the function.
The two minimal examples fun1
and fun2
below produce the desired result, being able to perform operations on x$column
, using max()
as an example. However, both rely on the seemingly (at least to me) inelegant
substitute()
and possibly eval()
fun1 <- function(x, column){
do.call("max", list(substitute(x[a], list(a = column))))
}
fun2 <- function(x, column){
max(eval((substitute(x[a], list(a = column)))))
}
df <- data.frame(B = rnorm(10))
fun1(df, "B")
fun2(df, "B")
I would like to be able to call the function as fun(df, B)
, for example. Other options I have considered but have not tried:
column
as an integer of the column number. I think this would avoid substitute()
. Ideally, the function could accept either.with(x, get(column))
, but, even if it works, I think this would still require substitute
formula()
and match.call()
, neither of which I have much experience with.Subquestion: Is do.call()
preferred over eval()
?
You can just use the column name directly:
df <- data.frame(A=1:10, B=2:11, C=3:12)
fun1 <- function(x, column){
max(x[,column])
}
fun1(df, "B")
fun1(df, c("B","A"))
There's no need to use substitute, eval, etc.
You can even pass the desired function as a parameter:
fun1 <- function(x, column, fn) {
fn(x[,column])
}
fun1(df, "B", max)
Alternatively, using [[
also works for selecting a single column at a time:
df <- data.frame(A=1:10, B=2:11, C=3:12)
fun1 <- function(x, column){
max(x[[column]])
}
fun1(df, "B")