I am writing a function, which needs a check on whether (and which!) column (variable) has all missing values (NA
, <NA>
). The following is fragment of the function:
test1 <- data.frame (matrix(c(1,2,3,NA,2,3,NA,NA,2), 3,3))
test2 <- data.frame (matrix(c(1,2,3,NA,NA,NA,NA,NA,2), 3,3))
na.test <- function (data) {
if (colSums(!is.na(data) == 0)){
stop ("The some variable in the dataset has all missing value,
remove the column to proceed")
}
}
na.test (test1)
Warning message:
In if (colSums(!is.na(data) == 0)) { :
the condition has length > 1 and only the first element will be used
Q1: Why is the above error and any fixes ?
Q2: Is there any way to find which of columns have all NA
, for example output the list (name of variable or column number)?
This is easy enough to with sapply
and a small anonymous function:
sapply(test1, function(x)all(is.na(x)))
X1 X2 X3
FALSE FALSE FALSE
sapply(test2, function(x)all(is.na(x)))
X1 X2 X3
FALSE TRUE FALSE
And inside a function:
na.test <- function (x) {
w <- sapply(x, function(x)all(is.na(x)))
if (any(w)) {
stop(paste("All NA in columns", paste(which(w), collapse=", ")))
}
}
na.test(test1)
na.test(test2)
Error in na.test(test2) : All NA in columns 2