R-regex: match strings not beginning with a pattern

aL3xa picture aL3xa · Dec 8, 2011 · Viewed 18.2k times · Source

I'd like to use regex to see if a string does not begin with a certain pattern. While I can use: [^ to blacklist certain characters, I can't figure out how to blacklist a pattern.

> grepl("^[^abc].+$", "foo")
[1] TRUE
> grepl("^[^abc].+$", "afoo")
[1] FALSE

I'd like to do something like grepl("^[^(abc)].+$", "afoo") and get TRUE, i.e. to match if the string does not start with abc sequence.

Note that I'm aware of this post, and I also tried using perl = TRUE, but with no success:

> grepl("^((?!hede).)*$", "hede", perl = TRUE)
[1] FALSE
> grepl("^((?!hede).)*$", "foohede", perl = TRUE)
[1] FALSE

Any ideas?

Answer

Dan picture Dan · Dec 8, 2011

Yeah. Put the zero width lookahead /outside/ the other parens. That should give you this:

> grepl("^(?!hede).*$", "hede", perl = TRUE)
[1] FALSE
> grepl("^(?!hede).*$", "foohede", perl = TRUE)
[1] TRUE

which I think is what you want.

Alternately if you want to capture the entire string, ^(?!hede)(.*)$ and ^((?!hede).*)$ are both equivalent and acceptable.