R: splitting a string between two characters using strsplit()

biohazard picture biohazard · Feb 9, 2014 · Viewed 8.3k times · Source

Let's say I have the following string:

s <- "ID=MIMAT0027618;Alias=MIMAT0027618;Name=hsa-miR-6859-5p;Derives_from=MI0022705"

I would like to recover the strings between ";" and "=" to get the following output:

[1] "MIMAT0027618"  "MIMAT0027618"  "hsa-miR-6859-5p"  "MI0022705"

Can I use strsplit() with more than one split element?

Answer

G. Grothendieck picture G. Grothendieck · Feb 9, 2014

1) strsplit with matrix Try this:

> matrix(strsplit(s, "[;=]")[[1]], 2)[2,]
[1] "MIMAT0027618"    "MIMAT0027618"    "hsa-miR-6859-5p" "MI0022705"   

2) strsplit with gsub or this use of strsplit with gsub:

> strsplit(gsub("[^=;]+=", "", s), ";")[[1]]
[1] "MIMAT0027618"    "MIMAT0027618"    "hsa-miR-6859-5p" "MI0022705"     

3) strsplit with sub or this use of strsplit with sub:

> sub(".*=", "", strsplit(s, ";")[[1]])
[1] "MIMAT0027618"    "MIMAT0027618"    "hsa-miR-6859-5p" "MI0022705"   

4) strapplyc or this which extracts consecutive non-semicolons after equal signs:

> library(gsubfn)
> strapplyc(s, "=([^;]+)", simplify = unlist)
[1] "MIMAT0027618"    "MIMAT0027618"    "hsa-miR-6859-5p" "MI0022705"  

ADDED additional strplit solutions.