Regex with non-capturing group in C#

ian93 picture ian93 · Mar 2, 2013 · Viewed 13.9k times · Source

I am using the following Regex

JOINTS.*\s*(?:(\d*\s*\S*\s*\S*\s*\S*)\r\n\s*)*

on the following type of data:

 JOINTS               DISPL.-X               DISPL.-Y               ROTATION


     1            0.000000E+00           0.975415E+01           0.616921E+01
     2            0.000000E+00           0.000000E+00           0.000000E+00

The idea is to extract two groups, each containing a line (starting with the Joint Number, 1, 2, etc.) The C# code is as follows:

string jointPattern = @"JOINTS.*\s*(?:(\d*\s*\S*\s*\S*\s*\S*)\r\n\s*)*";
MatchCollection mc = Regex.Matches(outFileSection, jointPattern );
foreach (Capture c in mc[0].Captures)
{
    JointOutput j = new JointOutput();
    string[] vals = c.Value.Split();
    j.Joint = int.Parse(vals[0]) - 1;
    j.XDisplacement = float.Parse(vals[1]);
    j.YDisplacement = float.Parse(vals[2]);
    j.Rotation = float.Parse(vals[3]);
    joints.Add(j);
}

However, this does not work: rather than returning two captured groups (the inside group), it returns one group: the entire block, including the column headers. Why does this happen? Does C# deal with un-captured groups differently?

Finally, are RegExes the best way to do this? (I really do feel like I have two problems now.)

Answer

Alan Moore picture Alan Moore · Mar 2, 2013

mc[0].Captures is equivalent to mc[0].Groups[0].Captures. Groups[0] always refers to the whole match, so there will only ever be the one Capture associated with it. The part you're looking for is captured in group #1, so you should be using mc[0].Groups[1].Captures.

But your regex is designed to match the whole input in one attempt, so the Matches() method will always return a MatchCollection with only one Match in it (assuming the match is successful). You might as well use Match() instead:

  Match m = Regex.Match(source, jointPattern);
  if (m.Success)
  {
    foreach (Capture c in m.Groups[1].Captures)
    {
      Console.WriteLine(c.Value);
    }
  }

output:

1            0.000000E+00           0.975415E+01           0.616921E+01
2            0.000000E+00           0.000000E+00           0.000000E+00