Understand the `Reduce` function

Johnathan picture Johnathan · Feb 16, 2015 · Viewed 35.9k times · Source

I have a question about the Reduce function in R. I read its documentation, but I am still confused a bit. So, I have 5 vectors with genes name. For example:

v1 <- c("geneA","geneB",""...)
v2 <- c("geneA","geneC",""...)
v3 <- c("geneD","geneE",""...)
v4 <- c("geneA","geneE",""...)
v5 <- c("geneB","geneC",""...)

And I would like to find out which genes are present in at least two vectors. Some people have suggested:

Reduce(intersect,list(a,b,c,d,e))

I would greatly appreciate if someone could please explain to me how this statement works, because I have seen Reduce used in other scenarios.

Answer

James picture James · Feb 16, 2015

Reduce takes a binary function and a list of data items and successively applies the function to the list elements in a recursive fashion. For example:

Reduce(intersect,list(a,b,c))

is the same as

intersect((intersect(a,b),c)

However, I don't think that construct will help you here as it will only return those elements that are common to all vectors.

To count the number of vectors that a gene appears in you could do the following:

vlist <- list(v1,v2,v3,v4,v5)
addmargins(table(gene=unlist(vlist), vec=rep(paste0("v",1:5),times=sapply(vlist,length))),2,list(Count=function(x) sum(x[x>0])))
       vec
gene    v1 v2 v3 v4 v5 Count
  geneA  1  1  0  1  0     3
  geneB  1  0  0  0  1     2
  geneC  0  1  0  0  1     2
  geneD  0  0  1  0  0     1
  geneE  0  0  1  1  0     2