R get rows based on multiple conditions - use dplyr and reshape2

user3375672 picture user3375672 · Dec 1, 2014 · Viewed 19.9k times · Source
df <- data.frame(
    exp=c(1,1,2,2),
  name=c("gene1", "gene2", "gene1", "gene2"),
    value=c(1,1,3,-1)
    )

In trying to get customed to the dplyr and reshape2I stumbled over a "simple" way to select rows based on several conditions. If I want to have those genes (the namevariable) that have valueabove 0 in experiment 1 (exp== 1) AND at the same time valuebelow 0 in experiment 2; in df this would be "gene2". Sure there must be many ways to this, e.g. subset df for each set of conditions (exp==1 & value > 0, and exp==2 and value < 0) and then join the results of these subset:

library(dplyr)    
inner_join(filter(df,exp == 1 & value > 0),filter(df,exp == 2 & value < 0), by= c("name"="name"))[[1]]

Although this works it looks very akward, and I feel that such conditioned filtering lies at the heart of reshape2 and dplyr but cannot figure out how to do this. Can someone enlighten me here?

Answer

A5C1D2H2I1M1N2O1R2T1 picture A5C1D2H2I1M1N2O1R2T1 · Dec 1, 2014

One alternative that comes to mind is to transform the data to a "wide" format and then do the filtering.

Here's an example using "data.table" (for the convenience of compound-statements):

library(data.table)
dcast.data.table(as.data.table(df), name ~ exp)[`1` > 0 & `2` < 0]
#     name 1  2
# 1: gene2 1 -1

Similarly, with "dplyr" and "tidyr":

library(dplyr)
library(tidyr)
df %>% 
  spread(exp, value) %>% 
  filter(`1` > 0 & `2` < 0)