R - Extract info after nth occurrence of a character from the right of string

alexb523 picture alexb523 · Nov 3, 2017 · Viewed 8.1k times · Source

I've seen many iterations of extracting w/ gsub but they mostly deal with extracting from left to right or after one occurrence. I am wanting to match from right to left, counting four occurrences of -, matching everything between the 3rd and 4th occurrence.

For example:

string                       outcome
here-are-some-words-to-try   some
a-b-c-d-e-f-g-h-i            f

Here are a few references I've tried using:

Answer

Jan picture Jan · Nov 3, 2017

You could use

([^-]+)(?:-[^-]+){3}$

See a demo on regex101.com.


In R this could be

library(dplyr)
library(stringr)
df <- data.frame(string = c('here-are-some-words-to-try', 'a-b-c-d-e-f-g-h-i', ' no dash in here'), stringsAsFactors = FALSE)

df <- df %>%
  mutate(outcome = str_match(string, '([^-]+)(?:-[^-]+){3}$')[,2])
df

And yields

                      string outcome
1 here-are-some-words-to-try    some
2          a-b-c-d-e-f-g-h-i       f
3            no dash in here    <NA>