using regex to skip ahead all characters until a specific sequence of letters is found using negative lookahead

phazei picture phazei · Jul 20, 2010 · Viewed 8.1k times · Source

I'm alright with basic regular expressions, but I get a bit lost around pos/neg look aheads/behinds.

I'm trying to pull the id # from this:

[keyword stuff=otherstuff id=123 morestuff=stuff]

There could be unlimited amounts of "stuff" before or after. I've been using The Regex Coach to help debug what I've tried, but I'm not moving forward anymore...

So far I have this:

\[keyword (?:id=([0-9]+))?[^\]]*\]

Which takes care of any extra attributes after the id, but I can't figure out how to ignore everything between keyword and id. I know I can't go [^id]* I believe I need to use a negative lookahead like this (?!id)* but I guess since it's zero-width, it doesn't move forward from there. This doesn't work either:

\[keyword[A-z0-9 =]*(?!id)(?:id=([0-9]+))?[^\]]*\]

I've been looking all over for examples, but haven't found any. Or perhaps I have, but they went so far over my head I didn't even realize what they were.

Help! Thanks.

EDIT: It has to match [keyword stuff=otherstuff] as well, where id= doesn't exist at all, so I have to have a 1 or 0 on the id # group. There are also other [otherkeywords id=32] which I do not want to match. The document needs to match multiple [keyword id=3] throughout the documents using preg_match_all.

Answer

Wrikken picture Wrikken · Jul 20, 2010

No lookahead/behind required:

/\[keyword(?:[^\]]*?\bid=([0-9]+))?[^\]]*?\]/

Added the ending '[^]]*]' to check for a real tag end, could be unnecessary.

Edit: added the \b to id as otherwise it could match [keyword you-dont-want-this-guid=123123-132123-123 id=123]

$ php -r 'preg_match_all("/\[keyword(?:[^\]]*?\bid=([0-9]+))?[^\]]*?\]/","[keyword stuff=otherstuff morestuff=stuff]",$matches);var_dump($matches);'
array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(42) "[keyword stuff=otherstuff morestuff=stuff]"
  }
  [1]=>
  array(1) {
    [0]=>
    string(0) ""
  }
}
$ php -r 'var_dump(preg_match_all("/\[keyword(?:[^\]]*?\bid=([0-9]+))?[^\]]*?\]/","[keyword stuff=otherstuff id=123 morestuff=stuff]",$matches),$matches);'
int(1)
array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(49) "[keyword stuff=otherstuff id=123 morestuff=stuff]"
  }
  [1]=>
  array(1) {
    [0]=>
    string(3) "123"
  }
}