Regular Expression Lookbehind doesn't work with quantifiers ('+' or '*')

Noel De Martin picture Noel De Martin · Jan 27, 2012 · Viewed 31.1k times · Source

I am trying to use lookbehinds in a regular expression and it doesn't seem to work as I expected. So, this is not my real usage, but to simplify I will put an example. Imagine I want to match "example" on a string that says "this is an example". So, according to my understanding of lookbehinds this should work:

(?<=this\sis\san\s*?)example

What this should do is find "this is an", then space characters and finally match the word "example". Now, it doesn't work and I don't understand why, is it impossible to use '+' or '*' inside lookbehinds?

I also tried those two and they work correctly, but don't fulfill my needs:

(?<=this\sis\san\s)example
this\sis\san\s*?example

I am using this site to test my regular expressions: http://gskinner.com/RegExr/

Answer

Gumbo picture Gumbo · Jan 27, 2012

Many regular expression libraries do only allow strict expressions to be used in look behind assertions like:

  • only match strings of the same fixed length: (?<=foo|bar|\s,\s) (three characters each)
  • only match strings of fixed lengths: (?<=foobar|\r\n) (each branch with fixed length)
  • only match strings with a upper bound length: (?<=\s{,4}) (up to four repetitions)

The reason for these limitations are mainly because those libraries can’t process regular expressions backwards at all or only a limited subset.

Another reason could be to avoid authors to build too complex regular expressions that are heavy to process as they have a so called pathological behavior (see also ReDoS).

See also section about limitations of look-behind assertions on Regular-Expressions.info.