I have a large CSV with the results of a medical survey from different locations (the location is a factor present in the data). As some analyses are specific to a location and for convenience, I'd like to extract subframes with the rows only from those locations. It happens that the location is the very first field so yes, I could do it by sorting the CSV rows, but I'd like to learn how to do it in R as I'm sure I'll need this for other columns.
So, in a nutshell, the question is: given a data frame foo, how can I create another data frame bar which only contains the rows from foo where foo$location = 'there'
?
Here are the two main approaches. I prefer this one for its readability:
bar <- subset(foo, location == "there")
Note that you can string together many conditionals with &
and |
to create complex subsets.
The second is the indexing approach. You can index rows in R with either numeric, or boolean slices. foo$location == "there"
returns a vector of T
and F
values that is the same length as the rows of foo
. You can do this to return only rows where the condition returns true.
foo[foo$location == "there", ]