I am trying to achieve something similar to this question but with multiple values that must be replaced by NA, and in large dataset.
df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = rep(1:9), var2 = rep(3:5, each = 3))
which generates this dataframe:
df
name foo var1 var2
1 a 1 1 3
2 a 2 2 3
3 a 3 3 3
4 b 4 4 4
5 b 5 5 4
6 b 6 6 4
7 c 7 7 5
8 c 8 8 5
9 c 9 9 5
I would like to replace all occurrences of, say, 3 and 4 by NA, but only in the columns that start with "var".
I know that I can use a combination of []
operators to achieve the result I want:
df[,grep("^var[:alnum:]?",colnames(df))][
df[,grep("^var[:alnum:]?",colnames(df))] == 3 |
df[,grep("^var[:alnum:]?",colnames(df))] == 4
] <- NA
df
name foo var1 var2
1 a 1 1 NA
2 a 2 2 NA
3 a 3 NA NA
4 b 4 NA NA
5 b 5 5 NA
6 b 6 6 NA
7 c 7 7 5
8 c 8 8 5
9 c 9 9 5
Now my questions are the following:
|
operator?You can also do this using replace
:
sel <- grepl("var",names(df))
df[sel] <- lapply(df[sel], function(x) replace(x,x %in% 3:4, NA) )
df
# name foo var1 var2
#1 a 1 1 NA
#2 a 2 2 NA
#3 a 3 NA NA
#4 b 4 NA NA
#5 b 5 5 NA
#6 b 6 6 NA
#7 c 7 7 5
#8 c 8 8 5
#9 c 9 9 5
Some quick benchmarking using a million row sample of data suggests this is quicker than the other answers.