sapply() with strsplit in R

Leonardo picture Leonardo · Jul 5, 2015 · Viewed 8.7k times · Source

I found this code:

string = c("G1:E001", "G2:E002", "G3:E003")
> sapply(strsplit(string, ":"), "[", 2)
[1] "E001" "E002" "E003"

clearly strsplit(string, ":") returns a vectors of size 3 where each component i is a vector of size 2 containing Gi and E00i.

But why the two more arguments "[", 2 have the effect to select only those E00i? As far as I see the only arguments accepted by the function are:

sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) 

Answer

akrun picture akrun · Jul 5, 2015

You could use sub to get the expected output instead of using strsplit/sapply

 sub('.*:', '', string)
 #[1] "E001" "E002" "E003"

Regarding your code, strsplit output is a list and list can be processed with apply family functions sapply/lapply/vapply/rapply etc. In this case, each list element have a length of 2 and we are selecting the second element.

strsplit(string, ":")
#[[1]]
#[1] "G1"   "E001"

#[[2]]
#[1] "G2"   "E002"

#[[3]]
#[1] "G3"   "E003"

lapply(strsplit(string, ":"), `[`, 2)
#[[1]]
#[1] "E001"

#[[2]]
#[1] "E002"

#[[3]]
#[1] "E003"

In the case of sapply, the default option is simplify=TRUE

 sapply(strsplit(string, ":"), `[`, 2, simplify=FALSE)
#[[1]]
#[1] "E001"

#[[2]]
#[1] "E002"

#[[3]]
#[1] "E003"

The [ can be replaced by anonymous function call

sapply(strsplit(string, ":"), function(x) x[2], simplify=FALSE)
#[[1]]
#[1] "E001"

#[[2]]
#[1] "E002"

#[[3]]
#[1] "E003"