How to extract everything until first occurrence of pattern

Ben picture Ben · Oct 18, 2016 · Viewed 20.5k times · Source

I'm trying to use the stringr package in R to extract everything from a string up until the first occurrence of an underscore.

What I've tried

str_extract("L0_123_abc", ".+?(?<=_)")
> "L0_"

Close but no cigar. How do I get this one? Also, Ideally I'd like something that's easy to extend so that I can get the information in between the 1st and 2nd underscore and get the information after the 3rd underscore.

Answer

Wiktor Stribiżew picture Wiktor Stribiżew · Oct 18, 2016

To get L0, you may use

> library(stringr)
> str_extract("L0_123_abc", "[^_]+")
[1] "L0"

The [^_]+ matches 1 or more chars other than _.

Also, you may split the string with _:

x <- str_split("L0_123_abc", fixed("_"))
> x
[[1]]
[1] "L0"  "123" "abc"

This way, you will have all the substrings you need.

The same can be achieved with

> str_extract_all("L0_123_abc", "[^_]+")
[[1]]
[1] "L0"  "123" "abc"