How can I subset rows in a data frame in R based on a vector of values?

Zelbinian picture Zelbinian · Mar 5, 2013 · Viewed 155.5k times · Source

I have two data sets that are supposed to be the same size but aren't. I need to trim the values from A that are not in B and vice versa in order to eliminate noise from a graph that's going into a report. (Don't worry, this data isn't being permanently deleted!)

I have read the following:

But I'm still not able to get this to work right. Here's my code:

bg2011missingFromBeg <- setdiff(x=eg2011$ID, y=bg2011$ID)
#attempt 1
eg2011cleaned <- subset(eg2011, ID != bg2011missingFromBeg)
#attempt 2
eg2011cleaned <- eg2011[!eg2011$ID %in% bg2011missingFromBeg]

The first try just eliminates the first value in the resulting setdiff vector. The second try yields and unwieldy error:

Error in `[.data.frame`(eg2012, !eg2012$ID %in% bg2012missingFromBeg) 
:  undefined columns selected

Answer

adibender picture adibender · Mar 5, 2013

This will give you what you want:

eg2011cleaned <- eg2011[!eg2011$ID %in% bg2011missingFromBeg, ]

The error in your second attempt is because you forgot the ,

In general, for convenience, the specification object[index] subsets columns for a 2d object. If you want to subset rows and keep all columns you have to use the specification object[index_rows, index_columns], while index_cols can be left blank, which will use all columns by default.

However, you still need to include the , to indicate that you want to get a subset of rows instead of a subset of columns.