Negative look ahead python regex

Michael picture Michael · Mar 31, 2012 · Viewed 18k times · Source

I would like to regex match a sequence of bytes when the string '02 d0' does not occur at a specific position in the string. The position where this string of two bytes cannot occur are byte positions 6 and 7 starting with the 0th byte on the right hand side.

This is what I have been using for testing:

#!/usr/bin/python
import re

p0 = re.compile('^24 [\da-f]{2} 03 (01|03) [\da-f]{2} [\da-f]{2} [\da-f]{2} (([^0])|    (0[^2])|(02 [^d])|(02 d[^0])) 01 c2 [\da-f]{2} [\da-f]{2} [\da-f]{2} 23')
p1 = re.compile('^24 [\da-f]{2} 03 (01|03) [\da-f]{2} [\da-f]{2} [\da-f]{2} (([^0])|(0[^2])|(02 [^d])|(02 d[^0])) 01')
p2 = re.compile('^24 [\da-f]{2} 03 (01|03) [\da-f]{2} [\da-f]{2} [\da-f]{2} (([^0])|(0[^2])|(02 [^d])|(02 d[^0]))')
p3 = re.compile('^24 [\da-f]{2} 03 (01|03) [\da-f]{2} [\da-f]{2} [\da-f]{2} (?!02 d0) 01')
p4 = re.compile('^24 [\da-f]{2} 03 (01|03) [\da-f]{2} [\da-f]{2} [\da-f]{2} (?!02 d0)')

yes = '24 0f 03 01 42 ff 00 04 a2 01 c2 00 c5 e5 23'
no  = '24 0f 03 01 42 ff 00 02 d0 01 c2 00 c5 e5 23'

print p0.match(yes)  # fail
print p0.match(no)   # fail
print '\n'
print p1.match(yes)  # fail
print p1.match(no)   # fail
print '\n'
print p2.match(yes)  # PASS
print p2.match(no)   # fail
print '\n'
print p3.match(yes)  # fail
print p3.match(no)   # fail
print '\n'
print p4.match(yes)  # PASS
print p4.match(no)   # fail

I looked at this example, but that method is less restrictive than I need. Could someone explain why I can only match properly when the negative look ahead is at the end of the string? What do I need to do to match when '02 d0' does not occur in this specific bit position?

Answer

Qtax picture Qtax · Mar 31, 2012

Lookaheads are "zero-width", meaning they do not consume any characters. For example, these two expressions will never match:

  1. (?=foo)bar
  2. (?!foo)foo

To make sure a number is not some specific number, you could use:

(?!42)\d\d # will match two digits that are not 42

In your case it could look like:

(?!02)[\da-f]{2} (?!0d)[\da-f]{2}

or:

(?!02 d0)[\da-f]{2} [\da-f]{2}