Subset dataframe by multiple logical conditions of rows to remove

r dataframe subset

Jota · Jun 5, 2011 · Viewed 141.6k times · Source

I would like to subset (filter) a dataframe by specifying which rows not (!) to keep in the new dataframe. Here is a simplified sample dataframe:

data
v1 v2 v3 v4
a  v  d  c
a  v  d  d
b  n  p  g
b  d  d  h    
c  k  d  c    
c  r  p  g
d  v  d  x
d  v  d  c
e  v  d  b
e  v  d  c

For example, if a row of column v1 has a "b", "d", or "e", I want to get rid of that row of observations, producing the following dataframe:

v1 v2 v3 v4
a  v  d  c
a  v  d  d
c  k  d  c    
c  r  p  g

I have been successful at subsetting based on one condition at a time. For example, here I remove rows where v1 contains a "b":

sub.data <- data[data[ , 1] != "b", ]

However, I have many, many such conditions, so doing it one at a time is not desirable. I have not been successful with the following:

sub.data <- data[data[ , 1] != c("b", "d", "e")

sub.data <- subset(data, data[ , 1] != c("b", "d", "e"))

I've tried some other things as well, like !%in%, but that doesn't seem to exist. Any ideas?

Answer

Try this

subset(data, !(v1 %in% c("b","d","e")))