RegEx - Exclude Matched Patterns

San picture San · Aug 14, 2013 · Viewed 66.2k times · Source

I have the below patterns to be excluded.

make it cheaper
make it cheapere
makeitcheaper.com.au
makeitcheaper
making it cheaper
www.make it cheaper
ww.make it cheaper.com

I've created a regex to match any of these. However, I want to get everything else other than these. I am not sure how to inverse this regex I've created.

mak(e|ing) ?it ?cheaper

Above pattern matches all the strings listed. Now I want it to match everything else. How do I do it?

From the search, it seems I need something like negative lookahead / look back. But, I don't really get it. Can some one point me in the right direction?

Answer

Bernhard Barker picture Bernhard Barker · Aug 14, 2013

You can just put it in a negative look-ahead like so:

(?!mak(e|ing) ?it ?cheaper)

Just like that isn't going to work though since, if you do a matches1, it won't match since you're just looking ahead, you aren't actually matching anything, and, if you do a find1, it will match many times, since you can start from lots of places in the string where the next characters doesn't match the above.

To fix this, depending on what you wish to do, we have 2 choices:

  1. If you want to exclude all strings that are exactly one of those (i.e. "make it cheaperblahblah" is not excluded), check for start (^) and end ($) of string:

    ^(?!mak(e|ing) ?it ?cheaper$).*
    

    The .* (zero or more wild-cards) is the actual matching taking place. The negative look-ahead checks from the first character.

  2. If you want to exclude all strings containing one of those, you can make sure the look-ahead isn't matched before every character we match:

    ^((?!mak(e|ing) ?it ?cheaper).)*$
    

    An alternative is to add wild-cards to the beginning of your look-ahead (i.e. exclude all strings that, from the start of the string, contain anything, then your pattern), but I don't currently see any advantage to this (arbitrary length look-ahead is also less likely to be supported by any given tool):

    ^(?!.*mak(e|ing) ?it ?cheaper).*
    

Because of the ^ and $, either doing a find or a matches will work for either of the above (though, in the case of matches, the ^ is optional and, in the case of find, the .* outside the look-ahead is optional).


1: Although they may not be called that, many languages have functions equivalent to matches and find with regex.


The above is the strictly-regex answer to this question.

A better approach might be to stick to the original regex (mak(e|ing) ?it ?cheaper) and see if you can negate the matches directly with the tool or language you're using.

In Java, for example, this would involve doing if (!string.matches(originalRegex)) (note the !, which negates the returned boolean) instead of if (string.matches(negLookRegex)).