How to avoid a loop in R: selecting items from a list

JD Long picture JD Long · Aug 31, 2009 · Viewed 32.8k times · Source

I could solve this using loops, but I am trying think in vectors so my code will be more R-esque.

I have a list of names. The format is firstname_lastname. I want to get out of this list a separate list with only the first names. I can't seem to get my mind around how to do this. Here's some example data:

t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
tsplit <- strsplit(t,"_")

which looks like this:

> tsplit
[[1]]
[1] "bob"   "smith"

[[2]]
[1] "mary" "jane"

[[3]]
[1] "jose"  "chung"

[[4]]
[1] "michael" "marx"   

[[5]]
[1] "charlie" "ivan"   

I could get out what I want using loops like this:

for (i in 1:length(tsplit)){
    if (i==1) {t_out <- tsplit[[i]][1]} else{t_out <- append(t_out, tsplit[[i]][1])} 
}

which would give me this:

t_out
[1] "bob"     "mary"    "jose"    "michael" "charlie"

So how can I do this without loops?

Answer

hadley picture hadley · Aug 31, 2009

And one more approach:

t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
pieces <- strsplit(t,"_")
sapply(pieces, "[", 1)

In words, the last line extracts the first element of each component of the list and then simplifies it into a vector.

How does this work? Well, you need to realise an alternative way of writing x[1] is "["(x, 1), i.e. there is a function called [ that does subsetting. The sapply call applies calls this function once for each element of the original list, passing in two arguments, the list element and 1.

The advantage of this approach over the others is that you can extract multiple elements from the list without having to recompute the splits. For example, the last name would be sapply(pieces, "[", 2). Once you get used to this idiom, it's pretty easy to read.