I want to iterate a function through different columns (with a common pattern in the column names) of a data.frame. for subsetting the data.frame I use this code that works:
df[,grep("abc", colnames(df))]
but I don't know how to apply my function f(x) to all the columns that match this pattern, either using a for loop or lapply function.
the function I'm using is:
compress= function(x) {
aggregate(df[,x,drop=FALSE],
list(hour = with(df,paste(dates(Time),
sprintf("%d:00:00",hours(Time))))),
sum,na.rm=TRUE)
}
where df (the data frame) and Time could be set as variables themselves but for the moment I don't need to do it.
Thanks Giulia
You've basically got it. Just use apply
on the columns of your subsetted data to apply
function f
over columns (the 2
in the second argument of apply
indicates columns, as opposed to 1
which indicates to apply
over rows):
apply( df[,grep("abc", colnames(df))] , 2 , f )
Or if you don't want to coerce your df
to a matrix
(which will happen with apply
) you can use lapply
as you suggest in much the same manner...
lapply( df[,grep("abc", colnames(df))] , f )
The return value from lapply
will be a list, with one element for each column. You can turn this back into a data.frame
by wrapping the lapply
call with a data.frame
, e.g. data.frame( lapply(...) )
# This function just multiplies its argument by 2
f <- function(x) x * 2
df <- data.frame( AB = runif(5) , AC = runif(5) , BB = runif(5) )
apply( df[,grep("A", colnames(df))] , 2 , f )
# AB AC
#[1,] 0.4130628 1.3302304
#[2,] 0.2550633 0.1896813
#[3,] 1.5066157 0.7679393
#[4,] 1.7900907 0.5487673
#[5,] 0.7489256 1.6292801
data.frame( lapply( df[,grep("A", colnames(df))] , f ) )
# AB AC
#1 0.4130628 1.3302304
#2 0.2550633 0.1896813
#3 1.5066157 0.7679393
#4 1.7900907 0.5487673
#5 0.7489256 1.6292801
# Note the important difference between the two methods...
class( data.frame( lapply( df[,grep("A", colnames(df))] , f ) ) )
#[1] "data.frame"
class( apply( df[,grep("A", colnames(df))] , 2 , f ) )
#[1] "matrix"
For the example function you want to run, it might be easier to rewrite it as a function that takes the df
as input and a vector of column names that you want to operate on. In this example the function returns a list, with each element of that list containing an aggregated data.frame
:
compress= function( df , x ) {
lapply( x , function(x){
aggregate(df[,x,drop=FALSE],
list(hour = with(df,paste(dates(Time),
sprintf("%d:00:00",hours(Time))))),
sum,na.rm=TRUE)
}
)
}
To run the function you then just call it, passing it the data.frame and a vector of colnames...
compress( df , names(df)[ grep("abc", names(df) ) ] )