df <- data.frame(
exp=c(1,1,2,2),
name=c("gene1", "gene2", "gene1", "gene2"),
value=c(1,1,3,-1)
)
In trying to get customed to the dplyr
and reshape2
I stumbled over a "simple" way to select rows based on several conditions. If I want to have those genes (the name
variable) that have value
above 0 in experiment 1 (exp
== 1) AND at the same time value
below 0 in experiment 2; in df this would be "gene2". Sure there must be many ways to this, e.g. subset df for each set of conditions (exp==1 & value > 0, and exp==2 and value < 0) and then join the results of these subset:
library(dplyr)
inner_join(filter(df,exp == 1 & value > 0),filter(df,exp == 2 & value < 0), by= c("name"="name"))[[1]]
Although this works it looks very akward, and I feel that such conditioned filtering lies at the heart of reshape2
and dplyr
but cannot figure out how to do this. Can someone enlighten me here?
One alternative that comes to mind is to transform the data to a "wide" format and then do the filtering.
Here's an example using "data.table" (for the convenience of compound-statements):
library(data.table)
dcast.data.table(as.data.table(df), name ~ exp)[`1` > 0 & `2` < 0]
# name 1 2
# 1: gene2 1 -1
Similarly, with "dplyr" and "tidyr":
library(dplyr)
library(tidyr)
df %>%
spread(exp, value) %>%
filter(`1` > 0 & `2` < 0)