Using regex to match string between two strings while excluding strings

Tola Odejayi picture Tola Odejayi · Jan 2, 2010 · Viewed 15.1k times · Source

Following on from a previous question in which I asked:

How can I use a regular expression to match text that is between two strings, where those two strings are themselves enclosed two other strings, with any amount of text between the inner and outer enclosing strings?

I got this answer:

/outer-start.*?inner-start(.*?)inner-end.*?outer-end/

I would now like to know how to exclude certain strings from the text between the outer enclosing strings and the inner enclosing strings.

For example, if I have this text:

outer-start some text inner-start text-that-i-want inner-end some more text outer-end

I would like 'some text' and 'some more text' not to contain the word 'unwanted'.

In other words, this is OK:

outer-start some wanted text inner-start text-that-i-want inner-end some more wanted text outer-end

But this is not OK:

outer-start some unwanted text inner-start text-that-i-want inner-end some more unwanted text outer-end

Or to explain further, the expression between outer and inner delimiters in the previous answer above should exclude the word 'unwanted'.

Is this easy to match using regexes?

Answer

Roger Pate picture Roger Pate · Jan 3, 2010

Replace the first and last (but not the middle) .*? with (?:(?!unwanted).)*?. (Where (?:...) is a non-capturing group, and (?!...) is a negative lookahead.)

However, this quickly ends up with corner cases and caveats in any real (instead of example) use, and if you would ask about what you're really doing (with real examples, even if they're simplified, instead of made up examples), you'll likely get better answers.