Java regex: Negative lookahead

Cody S picture Cody S · Jun 20, 2012 · Viewed 37.1k times · Source

I'm trying to craft two regular expressions that will match URIs. These URIs are of the format: /foo/someVariableData and /foo/someVariableData/bar/someOtherVariableData

I need two regexes. Each needs to match one but not the other.

The regexes I originally came up with are: /foo/.+ and /foo/.+/bar/.+ respectively.

I think the second regex is fine. It will only match the second string. The first regex, however, matches both. So, I started playing around (for the first time) with negative lookahead. I designed the regex /foo/.+(?!bar) and set up the following code to test it

public static void main(String[] args) {
    String shouldWork = "/foo/abc123doremi";
    String shouldntWork = "/foo/abc123doremi/bar/def456fasola";
    String regex = "/foo/.+(?!bar)";
    System.out.println("ShouldWork: " + shouldWork.matches(regex));
    System.out.println("ShouldntWork: " + shouldntWork.matches(regex));
}

And, of course, both of them resolve to true.

Anybody know what I'm doing wrong? I don't need to use Negative lookahead necessarily, I just need to solve the problem, and I think that negative lookahead might be one way to do it.

Thanks,

Answer

Tim Pietzcker picture Tim Pietzcker · Jun 20, 2012

Try

String regex = "/foo/(?!.*bar).+";

or possibly

String regex = "/foo/(?!.*\\bbar\\b).+";

to avoid failures on paths like /foo/baz/crowbars which I assume you do want that regex to match.

Explanation: (without the double backslashes required by Java strings)

/foo/ # Match "/foo/"
(?!   # Assert that it's impossible to match the following regex here:
 .*   #   any number of characters
 \b   #   followed by a word boundary
 bar  #   followed by "bar"
 \b   #   followed by a word boundary.
)     # End of lookahead assertion
.+    # Match one or more characters

\b, the "word boundary anchor", matches the empty space between an alphanumeric character and a non-alphanumeric character (or between the start/end of the string and an alnum character). Therefore, it matches before the b or after the r in "bar", but it fails to match between w and b in "crowbar".

Protip: Take a look at http://www.regular-expressions.info - a great regex tutorial.