Iterating a function through different columns of a data.frame matching a pattern in the column names

Giulia picture Giulia · Aug 15, 2013 · Viewed 9.8k times · Source

I want to iterate a function through different columns (with a common pattern in the column names) of a data.frame. for subsetting the data.frame I use this code that works:

df[,grep("abc", colnames(df))]

but I don't know how to apply my function f(x) to all the columns that match this pattern, either using a for loop or lapply function.

the function I'm using is:

compress= function(x) {
  aggregate(df[,x,drop=FALSE],
        list(hour = with(df,paste(dates(Time),
                                         sprintf("%d:00:00",hours(Time))))),
        sum,na.rm=TRUE)
}

where df (the data frame) and Time could be set as variables themselves but for the moment I don't need to do it.

Thanks Giulia

Answer

Simon O'Hanlon picture Simon O'Hanlon · Aug 15, 2013

You've basically got it. Just use apply on the columns of your subsetted data to apply function f over columns (the 2 in the second argument of apply indicates columns, as opposed to 1 which indicates to apply over rows):

apply( df[,grep("abc", colnames(df))] , 2 , f )

Or if you don't want to coerce your df to a matrix (which will happen with apply) you can use lapply as you suggest in much the same manner...

lapply( df[,grep("abc", colnames(df))] , f )

The return value from lapply will be a list, with one element for each column. You can turn this back into a data.frame by wrapping the lapply call with a data.frame, e.g. data.frame( lapply(...) )

Example

# This function just multiplies its argument by 2
f <- function(x) x * 2

df <- data.frame( AB = runif(5) , AC = runif(5) , BB = runif(5) )


apply( df[,grep("A", colnames(df))] , 2 , f )
#            AB        AC
#[1,] 0.4130628 1.3302304
#[2,] 0.2550633 0.1896813
#[3,] 1.5066157 0.7679393
#[4,] 1.7900907 0.5487673
#[5,] 0.7489256 1.6292801


data.frame( lapply( df[,grep("A", colnames(df))] , f ) )
#         AB        AC
#1 0.4130628 1.3302304
#2 0.2550633 0.1896813
#3 1.5066157 0.7679393
#4 1.7900907 0.5487673
#5 0.7489256 1.6292801

# Note the important difference between the two methods...
class( data.frame( lapply( df[,grep("A", colnames(df))] , f ) ) )
#[1] "data.frame"
class( apply( df[,grep("A", colnames(df))] , 2 , f ) )
#[1] "matrix"

Second edit

For the example function you want to run, it might be easier to rewrite it as a function that takes the df as input and a vector of column names that you want to operate on. In this example the function returns a list, with each element of that list containing an aggregated data.frame:

compress= function( df , x ) {
  lapply( x , function(x){
  aggregate(df[,x,drop=FALSE],
        list(hour = with(df,paste(dates(Time),
                                         sprintf("%d:00:00",hours(Time))))),
        sum,na.rm=TRUE)
    }
  )
}

To run the function you then just call it, passing it the data.frame and a vector of colnames...

compress( df , names(df)[ grep("abc", names(df) ) ] )