I am trying to find a simple way to extract an unknown substring (could be anything) that appear between two known substrings. For example, I have a string:
a<-" anything goes here, STR1 GET_ME STR2, anything goes here"
I need to extract the string GET_ME
which is between STR1 and STR2 (without the white spaces).
I am trying str_extract(a, "STR1 (.+) STR2")
, but I am getting the entire match
[1] "STR1 GET_ME STR2"
I can of course strip the known strings, to isolate the substring I need, but I think there should be a cleaner way to do it by using a correct regular expression.
You may use str_match
with STR1 (.*?) STR2
(note the spaces are "meaningful", if you want to just match anything in between STR1
and STR2
use STR1(.*?)STR2
, or use STR1\\s*(.*?)\\s*STR2
to trim the value you need). If you have multiple occurrences, use str_match_all
.
Also, if you need to match strings that span across line breaks/newlines add (?s)
at the start of the pattern: (?s)STR1(.*?)STR2
/ (?s)STR1\\s*(.*?)\\s*STR2
.
library(stringr)
a <- " anything goes here, STR1 GET_ME STR2, anything goes here"
res <- str_match(a, "STR1\\s*(.*?)\\s*STR2")
res[,2]
[1] "GET_ME"
Another way using base R regexec
(to get the first match):
test <- " anything goes here, STR1 GET_ME STR2, anything goes here STR1 GET_ME2 STR2"
pattern <- "STR1\\s*(.*?)\\s*STR2"
result <- regmatches(test, regexec(pattern, test))
result[[1]][2]
[1] "GET_ME"