I have been banging my head against this for some time now:
I want to capture all [a-z]+[0-9]?
character sequences excluding strings such as sin|cos|tan
etc.
So having done my regex homework the following regex should work:
(?:(?!(sin|cos|tan)))\b[a-z]+[0-9]?
As you see I am using negative lookahead along with alternation - the \b
after the non-capturing group closing parenthesis is critical to avoid matching the in
of sin
etc. The regex makes sense and as a matter of fact I have tried it with RegexBuddy and Java as the target implementation and get the wanted result but it doesn't work using Java Matcher and Pattern objects!
Any thoughts?
cheers
The \b
is in the wrong place. It would be looking for a word boundary that didn't have sin/cos/tan before it. But a boundary just after any of those would have a letter at the end, so it would have to be an end-of-word boundary, which is can't be if the next character is a-z.
Also, the negative lookahead would (if it worked) exclude strings like cost
, which I'm not sure you want if you're just filtering out keywords.
I suggest:
\b(?!sin\b|cos\b|tan\b)[a-z]+[0-9]?\b
Or, more simply, you could just match \b[a-z]+[0-9]?\b
and filter out the strings in the keyword list afterwards. You don't always have to do everything in regex.