Named Groups, Unnamed Groups and Captures

I see this question come up a bit in regex so, I thought that I'd blog about it.  It has to do with 2 things: named groups and captures.  First, an example...

I have a set of attributes ascribed to a value and I want to match each of them and then write them out - this is similar to matching attributes off of an xml or html element:

Example text:
Attributes=(Animal=cat; Human=paul; Car=ford; Color=green;)

Sample pattern:
Attributes=\(((?'type'\w+)(=)(?'value'\w+)\;\s?)+\)


Problem 1: Named and Unnamed Groups

This pattern uses 2 named groups - "type" and "value" - to store each of the attributes; it also has 2 unnamed groups, one which matches the entire attribute string and one which matches the "=" sign between type and value.

Looking at that pattern, you know that there's going to be 4 groups and, using logic you would probably expect them to appear in the following order:

  1. Group 0 : The unnamed entire match
  2. Group 1 : The named "type" group
  3. Group 2 : The unnamed "=" group
  4. Group 3 : The named "value" group

Unnamed Groups always come first

The first important rule of .NET regex's is that unnamed groups always come before named groups when you are enumerating over a Groups collection.  So, the order of our groups will be:

  1.     Group 0 : The unnamed entire match
  2.     Group 1 : The unnamed "=" group
  3.     Group 2 : The named "type" group
  4.     Group 3 : The named "value" group

Problem 2: Groups and Captures

Another gotcha with this example arises when a user is attempting to write out all of the results to the screen.  As you can see, there will be:

  • 1 Match - The entire string
  • 4 Groups - as we've already seen
  • and 4 instances of the attributes.

The question is, how to get each of those 4 attribute values?  The answer is that each Group has a Captures collection to store each "capture".  So, the idea is to get a count of the captures for a group and then display the value at each index between 0 and the count of captures for that group.

Here's some sample code which demonstrates how you'd do that for the example shown above:

 

string pattern = @"Attributes=\(((?'type'\w+)(=)(?'value'\w+)\;\s?)+\)" ;
string input = @"Attributes=(Animal=cat; Human=paul; Car=ford; Color=green;)" ;

Match m = Regex.Match(input, pattern);

if( m.Groups["type"].Success ) {
  
  // this will tell us how many captures we have...
  int matchedItems = m.Groups["type"].Captures.Count ;

  // now, enumerate the Captures and render the groups for each Capture...
  for( int i=0; i<matchedItems; i++ ) {
    
    string name = m.Groups["type"].Captures[i].Value ;
    string val = m.Groups["value"].Captures[i].Value ;

    Console.WriteLine("{0} = {1}", name, val) ;
  }
}

Console.ReadLine() ;

 

And here's the output generated by the above example...

Animal = cat
Human = paul
Car = ford
Color = green

No Comments