rdata: Some method to iterate through column names of a data frame?

Wells picture Wells · Apr 19, 2013 · Viewed 8.8k times · Source

I have about 30 lines of code that do just this (getting Z scores):

data$z_col1 <- (data$col1 - mean(data$col1, na.rm = TRUE)) / sd(data$col1, na.rm = TRUE)
data$z_col2 <- (data$col2 - mean(data$col2, na.rm = TRUE)) / sd(data$col2, na.rm = TRUE)
data$z_col3 <- (data$col3 - mean(data$col3, na.rm = TRUE)) / sd(data$col3, na.rm = TRUE)
data$z_col4 <- (data$col4 - mean(data$col4, na.rm = TRUE)) / sd(data$col4, na.rm = TRUE)
data$z_col5 <- (data$col5 - mean(data$col5, na.rm = TRUE)) / sd(data$col5, na.rm = TRUE)

Is there some way, maybe using apply() or something, that I can just essentially do (python):

for col in ['col1', 'col2', 'col3']:
    data{col} = ... z score code here

Thanks R friends.

Answer

mnel picture mnel · Jul 11, 2013

A data.frame is a list, thus you can use lapply. Don't use apply on a data.frame as this will coerce to a matrix.

lapply(data, function(x) (x - mean(x,na.rm = TRUE))/sd(x, na.rm = TRUE))

Or you could use scale which performs this calculation on a vector.

lapply(data, scale)

You can translate the python style approach directy

for(col in names(data)){
   data[[col]] <- scale(data[[col]])
}

Note that this approach is not memory efficient in R as [[<.data.frame copies the entire data.frame each time.