Elegant way to report missing values in a data.frame

Zach picture Zach · Nov 29, 2011 · Viewed 123.4k times · Source

Here's a little piece of code I wrote to report variables with missing values from a data frame. I'm trying to think of a more elegant way to do this, one that perhaps returns a data.frame, but I'm stuck:

for (Var in names(airquality)) {
    missing <- sum(is.na(airquality[,Var]))
    if (missing > 0) {
        print(c(Var,missing))
    }
}

Edit: I'm dealing with data.frames with dozens to hundreds of variables, so it's key that we only report variables with missing values.

Answer

Joshua Ulrich picture Joshua Ulrich · Nov 29, 2011

Just use sapply

> sapply(airquality, function(x) sum(is.na(x)))
  Ozone Solar.R    Wind    Temp   Month     Day 
     37       7       0       0       0       0

You could also use apply or colSums on the matrix created by is.na()

> apply(is.na(airquality),2,sum)
  Ozone Solar.R    Wind    Temp   Month     Day 
     37       7       0       0       0       0
> colSums(is.na(airquality))
  Ozone Solar.R    Wind    Temp   Month     Day 
     37       7       0       0       0       0