Regex - Repeating Capturing Group

Jordan Davis picture Jordan Davis · Apr 18, 2017 · Viewed 13.2k times · Source

I'm trying to figure out how I can repeat a capture group on the comma-separated values in this the following url string:

id=1,2;name=user1,user2,user3;city=Oakland,San Francisco,Seattle;zip=94553,94523;

I'm using this RegExp which is return results I want, except for the values since they're dynamic ie. could be 2,3,4,etc users in the url parameter and was wondering if I could create a capture group for each value instead of user1,user2,user3 as one capture-group.

RegExp: (^|;|:)(\w+)=([^;]+)*

Here is a live demo of it online using RegExp

Example Output:

  • Group1 - (semi-colon,colon)
  • Group2 - (key ie. id,name,city,zip)
  • Group3 - (value1)
  • Group4 - (value2) *if exists
  • Group5 - (value3) *if exists
  • Group6 - (value4) *if exists

etc... based on the dynamic values like I explained before.

Question: Whats wrong with my expression I'm using the * to loop for repeated patterns?

Answer

Peter G picture Peter G · Apr 18, 2017

Regex doesn't support what you're trying to do. When the engine enters the capturing group a second time, it overwrites what it had captured the first time. Consider a simple example (thanks regular-expressions.info): /(abc|123)+/ used on 'abc123'. It will match "abc" then see the plus and try again, matching the "123". The final capturing group in the output will be "123".

This happens no matter what pattern you try and any limitation you set simply changes when the regex will accept the string. Consider /(abc|123){2}/. This accepts 'abc123' with the capturing group as "123" but not 'abc123abc'. Putting a capturing group inside another doesn't work either. When you create a capturing group, it's like creating a variable. It can only have one value and subsequent values overwrite the previous one. You'll never be able to have more capturing groups than you have parentheses pairs (you can definitely have fewer, though).

A possible fix then would be to split the string on ';', then each of those on '=', then the right-hand side of those on ','. That would get you [['id', '1', '2'], ['name', 'user1', ...], ['city', ...], ['zip', ...]].

That comes out to be:

function (str) {
  var afterSplit = str.split(';|:');
  afterSplit.pop() // final semicolon creates empty string
  for (var i = 0; i < afterSplit.length; i++) {
    afterSplit[i] = afterSplit[i].split('=');
    afterSplit[i][1] = afterSplit[i][1].split(','); // optionally, you can flatten the array from here to get something nicer
  }
  return afterSplit;
}