A regex to match a substring that isn't followed by a certain other substring

Rayne picture Rayne · Apr 13, 2010 · Viewed 89.5k times · Source

I need a regex that will match blahfooblah but not blahfoobarblah

I want it to match only foo and everything around foo, as long as it isn't followed by bar.

I tried using this: foo.*(?<!bar) which is fairly close, but it matches blahfoobarblah. The negative look behind needs to match anything and not just bar.

The specific language I'm using is Clojure which uses Java regexes under the hood.

EDIT: More specifically, I also need it to pass blahfooblahfoobarblah but not blahfoobarblahblah.

Answer

maček picture maček · Apr 13, 2010

Try:

/(?!.*bar)(?=.*foo)^(\w+)$/

Tests:

blahfooblah            # pass
blahfooblahbarfail     # fail
somethingfoo           # pass
shouldbarfooshouldfail # fail
barfoofail             # fail

Regular expression explanation

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    bar                      'bar'
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    foo                      'foo'
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

Other regex

If you only want to exclude bar when it is directly after foo, you can use

/(?!.*foobar)(?=.*foo)^(\w+)$/

Edit

You made an update to your question to make it specific.

/(?=.*foo(?!bar))^(\w+)$/

New tests

fooshouldbarpass               # pass
butnotfoobarfail               # fail
fooshouldpassevenwithfoobar    # pass
nofuuhere                      # fail

New explanation

(?=.*foo(?!bar)) ensures a foo is found but is not followed directly bar