I'm looking for a general regex construct to match everything in pattern x EXCEPT matches to pattern y. This is hard to explain both completely and concisely...see Material Nonimplication for a formal definition.
For example, match any word character (\w
) EXCEPT 'p'. Note I'm subtracting a small set (the letter 'p') from a larger set (all word characters). I can't just say [^p]
because that doesn't take into account the larger limiting set of only word characters. For this little example, sure, I could manually reconstruct something like [a-oq-zA-OQ-Z0-9_]
, which is a pain but doable. But i'm looking for a more general construct so that at least the large positive set can be a more complex expression. Like match ((?<=(so|me|^))big(com?pl{1,3}ex([pA]t{2}ern)
except when it starts with "My".
Edit: I realize that was a bad example, since excluding stuff at the begginning or end is a situation where negative look-ahead and look-behind expressions work. (Bohemian I still gave you an upvote for illustrating this). So...what about excluding matches that contain "My" somewhere in the middle?...I'm still really looking for a general construct, like a regex equivalent of the following pseudo-sql
select [captures] from [input]
where (
input MATCHES [pattern1]
AND NOT capture MATCHES [pattern2]
)
If there answer is "it does not exist and here is why..." I'd like to know that too.
Edit 2: If I wanted to define my own function to do this it would be something like (here's a C# LINQ version):
public static Match[] RegexMNI(string input,
string positivePattern,
string negativePattern) {
return (from Match m in Regex.Matches(input, positivePattern)
where !Regex.IsMatch(m.Value, negativePattern)
select m).ToArray();
}
I'm STILL just wondering if there is a native regex construct that could do this.
This will match any character that is a word and is not a p
:
((?=[^p])\w)
To solve your example, use a negative look-ahead for "My" anywhere in the input, ie (?!.*My)
:
^(?!.*My)((?<=(so|me|^))big(com?pl{1,3}ex([pA]t{2}ern)
Note the anchor to start of input ^
which is required to make it work.