Extract a subset of a dataframe based on a condition involving a field

wishihadabettername picture wishihadabettername · Aug 10, 2010 · Viewed 236.1k times · Source

I have a large CSV with the results of a medical survey from different locations (the location is a factor present in the data). As some analyses are specific to a location and for convenience, I'd like to extract subframes with the rows only from those locations. It happens that the location is the very first field so yes, I could do it by sorting the CSV rows, but I'd like to learn how to do it in R as I'm sure I'll need this for other columns.

So, in a nutshell, the question is: given a data frame foo, how can I create another data frame bar which only contains the rows from foo where foo$location = 'there'?

Answer

JoFrhwld picture JoFrhwld · Aug 10, 2010

Here are the two main approaches. I prefer this one for its readability:

bar <- subset(foo, location == "there")

Note that you can string together many conditionals with & and | to create complex subsets.

The second is the indexing approach. You can index rows in R with either numeric, or boolean slices. foo$location == "there" returns a vector of T and F values that is the same length as the rows of foo. You can do this to return only rows where the condition returns true.

foo[foo$location == "there", ]