Regex for "AND NOT" operation

Joshua Honig picture Joshua Honig · Sep 25, 2011 · Viewed 59.1k times · Source

I'm looking for a general regex construct to match everything in pattern x EXCEPT matches to pattern y. This is hard to explain both completely and concisely...see Material Nonimplication for a formal definition.

For example, match any word character (\w) EXCEPT 'p'. Note I'm subtracting a small set (the letter 'p') from a larger set (all word characters). I can't just say [^p] because that doesn't take into account the larger limiting set of only word characters. For this little example, sure, I could manually reconstruct something like [a-oq-zA-OQ-Z0-9_], which is a pain but doable. But i'm looking for a more general construct so that at least the large positive set can be a more complex expression. Like match ((?<=(so|me|^))big(com?pl{1,3}ex([pA]t{2}ern) except when it starts with "My".

Edit: I realize that was a bad example, since excluding stuff at the begginning or end is a situation where negative look-ahead and look-behind expressions work. (Bohemian I still gave you an upvote for illustrating this). So...what about excluding matches that contain "My" somewhere in the middle?...I'm still really looking for a general construct, like a regex equivalent of the following pseudo-sql

select [captures] from [input]
where (
    input MATCHES [pattern1]
    AND NOT capture MATCHES [pattern2]
)

If there answer is "it does not exist and here is why..." I'd like to know that too.

Edit 2: If I wanted to define my own function to do this it would be something like (here's a C# LINQ version):

public static Match[] RegexMNI(string input, 
                               string positivePattern, 
                               string negativePattern) {
    return (from Match m in Regex.Matches(input, positivePattern)
            where !Regex.IsMatch(m.Value, negativePattern)
            select m).ToArray();
}

I'm STILL just wondering if there is a native regex construct that could do this.

Answer

Bohemian picture Bohemian · Sep 25, 2011

This will match any character that is a word and is not a p:

((?=[^p])\w)

To solve your example, use a negative look-ahead for "My" anywhere in the input, ie (?!.*My):

^(?!.*My)((?<=(so|me|^))big(com?pl{1,3}ex([pA]t{2}ern)

Note the anchor to start of input ^ which is required to make it work.