using strsplit and subset in dplyr and mutate

chungkim271 picture chungkim271 · Mar 2, 2017 · Viewed 17.3k times · Source

I have a data table with one string column. I'd like to create another column that is a subset of this column using strsplit.

dat <- data.table(labels=c('a_1','b_2','c_3','d_4'))

The output I want is

label  sub_label
a_1    a
b_2    b
c_3    c
d_4    d 

I've tried the followings but neither seems to work.

dat %>%
    mutate(
        sub_labels=strsplit(as.character(labels), "_")[[1]][1]
    ) 
# gives a column whose values are all "a"

this one, which seems logical to me,

dat %>%
    mutate(
        sub_labels=sapply(strsplit(as.character(labels), "_"), function(x) x[[1]][1])
    )

gives an error

Error: Don't know how to handle type pairlist

I saw another post where paste-collapse on the output from strsplit worked so I don't understand why subsetting in an anonymous function is giving issues. Thanks for any elucidation on this.

Answer

Romain Francois picture Romain Francois · Mar 2, 2017

tidyr::separate can help here:

> dat %>% separate(labels, c("first", "second") )
   first second
1:     a      1
2:     b      2
3:     c      3
4:     d      4