How to omit rows with NA in only two columns in R?

Question 1

How to omit rows with NA in only two columns in R?

r na

Thomas · Aug 5, 2014 · Viewed 14.7k times · Source

Answer

Answer

df[!with(df,is.na(x)& is.na(y)),]
#      x y  z
#1  1 4  8
#2  2 5  9
#4  3 6 11
#5 NA 7 NA

I did benchmarked on a slightly bigger dataset. Here are the results:

set.seed(237)
df <- data.frame(x=sample(c(NA,1:20), 1e6, replace=T), y= sample(c(NA, 1:10), 1e6, replace=T), z= sample(c(NA, 5:15), 1e6,replace=T)) 

f1 <- function() df[!with(df,is.na(x)& is.na(y)),]
f2 <- function() df[rowSums(is.na(df[c("x", "y")])) != 2, ]
f3 <- function()  df[ apply( df, 1, function(x) sum(is.na(x))>1 ), ] 

library(microbenchmark)

microbenchmark(f1(), f2(), f3(), unit="relative")
Unit: relative
#expr       min        lq    median        uq       max neval
# f1()  1.000000  1.000000  1.000000  1.000000  1.000000   100
# f2()  1.044812  1.068189  1.138323  1.129611  0.856396   100
# f3() 26.205272 25.848441 24.357665 21.799930 22.881378   100

Question 2

I want to omit rows where NA appears in both of two columns.

I'm familiar with na.omit, is.na, and complete.cases, but can't figure out how to use these to get what I want. For example, I have the following dataframe:

(df <- structure(list(x = c(1L, 2L, NA, 3L, NA),
                     y = c(4L, 5L, NA, 6L, 7L),
                     z = c(8L, 9L, 10L, 11L, NA)),
                .Names = c("x", "y", "z"),
                class = "data.frame",
                row.names = c(NA, -5L)))
x   y   z
1   4   8
2   5   9
NA  NA  10
3   6   11
NA  7   NA

and I want to remove only those rows where NAappears in both the x and y columns (excluding anything in z), to give

Does anyone know an easy way to do this? Using na.omit, is.na, or complete.cases is not working.

How to omit rows with NA in only two columns in R?

Answer

Related questions