I have about 30 lines of code that do just this (getting Z scores):
data$z_col1 <- (data$col1 - mean(data$col1, na.rm = TRUE)) / sd(data$col1, na.rm = TRUE)
data$z_col2 <- (data$col2 - mean(data$col2, na.rm = TRUE)) / sd(data$col2, na.rm = TRUE)
data$z_col3 <- (data$col3 - mean(data$col3, na.rm = TRUE)) / sd(data$col3, na.rm = TRUE)
data$z_col4 <- (data$col4 - mean(data$col4, na.rm = TRUE)) / sd(data$col4, na.rm = TRUE)
data$z_col5 <- (data$col5 - mean(data$col5, na.rm = TRUE)) / sd(data$col5, na.rm = TRUE)
Is there some way, maybe using apply()
or something, that I can just essentially do (python):
for col in ['col1', 'col2', 'col3']:
data{col} = ... z score code here
Thanks R friends.
A data.frame
is a list, thus you can use lapply
. Don't use apply
on a data.frame
as this will coerce to a matrix
.
lapply(data, function(x) (x - mean(x,na.rm = TRUE))/sd(x, na.rm = TRUE))
Or you could use scale
which performs this calculation on a vector.
lapply(data, scale)
You can translate the python
style approach directy
for(col in names(data)){
data[[col]] <- scale(data[[col]])
}
Note that this approach is not memory efficient in R as [[<.data.frame
copies the entire data.frame each time.