Find columns with all missing values

SHRram picture SHRram · Jul 4, 2012 · Viewed 55.5k times · Source

I am writing a function, which needs a check on whether (and which!) column (variable) has all missing values (NA, <NA>). The following is fragment of the function:

test1 <- data.frame (matrix(c(1,2,3,NA,2,3,NA,NA,2), 3,3))
test2 <- data.frame (matrix(c(1,2,3,NA,NA,NA,NA,NA,2), 3,3))

na.test <-  function (data) {
  if (colSums(!is.na(data) == 0)){
      stop ("The some variable in the dataset has all missing value,
     remove the column to proceed")
      }
      }
na.test (test1)

Warning message:
In if (colSums(!is.na(data) == 0)) { :
  the condition has length > 1 and only the first element will be used

Q1: Why is the above error and any fixes ?

Q2: Is there any way to find which of columns have all NA, for example output the list (name of variable or column number)?

Answer

Andrie picture Andrie · Jul 4, 2012

This is easy enough to with sapply and a small anonymous function:

sapply(test1, function(x)all(is.na(x)))
   X1    X2    X3 
FALSE FALSE FALSE 

sapply(test2, function(x)all(is.na(x)))
   X1    X2    X3 
FALSE  TRUE FALSE 

And inside a function:

na.test <-  function (x) {
  w <- sapply(x, function(x)all(is.na(x)))
  if (any(w)) {
    stop(paste("All NA in columns", paste(which(w), collapse=", ")))
  }
}

na.test(test1)

na.test(test2)
Error in na.test(test2) : All NA in columns 2