Powershell: Replacing regex named groups with variables

Hoobajoob picture Hoobajoob · Sep 1, 2012 · Viewed 8.9k times · Source

Say I have a regular expression like the following, but I loaded it from a file into a variable $regex, and so have no idea at design time what its contents are, but at runtime I can discover that it includes the "version1", "version2", "version3" and "version4" named groups:

"Version (?<version1>\d),(?<version2>\d),(?<version3>\d),(?<version4>\d)"

...and I have these variables:

$version1 = "3"
$version2 = "2"
$version3 = "1"
$version4 = "0"

...and I come across the following string in a file:

Version 7,7,0,0

...which is stored in a variable $input, so that ($input -match $regex) evaluates to $true.

How can I replace the named groups from $regex in the string $input with the values of $version1, $version2, $version3, $version4 if I do not know the order in which they appear in $regex (I only know that $regex includes these named groups)?

I can't find any references describing the syntax for replacing a named group with the value of a variable by using the group name as an index to the match - is this even supported?

EDIT: To clarify - the goal is to replace templated version strings in any kind of text file where the version string in a given file requires replacement of a variable number of version fields (could be 2, 3, or all 4 fields). For example, the text in a file could look like any of these (but is not restricted to these):

#define SOME_MACRO(4, 1, 0, 0)

Version "1.2.3.4"

SomeStruct vs = { 99,99,99,99 }

Users can specify a file set and a regular expression to match the line containing the fields, with the original idea being that the individual fields would be captured by named groups. The utility has the individual version field values that should be substituted in the file, but has to preserve the original format of the line that will contain the substitutions, and substitute only the requested fields.

EDIT-2: I think I can get the result I need with substring calculations based on the position and extent of each of the matches, but was hoping Powershell's replace operation was going to save me some work.

EDIT-3: So, as Ansgar correctly and succinctly describes below, there isn't a way (using only the original input string, a regular expression about which you only know the named groups, and the resulting matches) to use the "-replace" operation (or other regex operations) to perform substitutions of the captures of the named groups, while leaving the rest of the original string intact. For this problem, if anybody's curious, I ended up using the solution below. YMMV, other solutions possible. Many thanks to Ansgar for his feedback and options provided.

In the following code block:

  • $input is a line of text on which substitution is to be performed
  • $regex is a regular expression (of type [string]) read from a file that has been verified to contain at least one of the supported named groups
  • $regexToGroupName is a hash table that maps a regex string to an array of group names ordered according to the order of the array returned by [regex]::GetGroupNames(), which matches the left-to-right order in which they appear in the expression
  • $groupNameToVersionNumber is a hash table that maps a group name to a version number.

Constraints on the named groups within $regex are only (I think) that the expression within the named groups cannot be nested, and should match at most once within the input string.

# This will give us the index and extent of each substring
# that we will be replacing (the parts that we will not keep)
$matchResults = ([regex]$regex).match($input)

# This will hold substrings from $input that were not captured
# by any of the supported named groups, as well as the replacement
# version strings, properly ordered, but will omit substrings captured
# by the named groups
$lineParts = @()
$startingIndex = 0
foreach ($groupName in $regexToGroupName.$regex)
{
    # Excise the substring leading up to the match for this group...
    $lineParts = $lineParts + $input.Substring($startingIndex, $matchResults.groups[$groupName].Index - $startingIndex)

    # Instead of the matched substring, we'll use the substitution
    $lineParts = $lineParts + $groupNameToVersionNumber.$groupName

    # Set the starting index of the next substring that we will keep...
    $startingIndex = $matchResults.groups[$groupName].Index + $matchResults.groups[$groupName].Length
}

# Keep the end of the original string (if there's anything left)
$lineParts = $lineParts + $input.Substring($startingIndex, $input.Length - $startingIndex)

$newLine = ""
foreach ($part in $lineParts)
{
   $newLine = $newLine + $part
}
$input= $newLine

Answer

JohnLBevan picture JohnLBevan · Apr 6, 2017

Simple Solution

In the scenario where you simply want to replace a version number found somewhere in your $input text, you could simply do this:

$input -replace '(Version\s+)\d+,\d+,\d+,\d+',"`$1$Version1,$Version2,$Version3,$Version4"

Using Named Captures in PowerShell

Regarding your question about named captures, that can be done by using curly brackets. i.e.

'dogcatcher' -replace '(?<pet>dog|cat)','I have a pet ${pet}.  '

Gives:

I have a pet dog.  I have a pet cat.  cher

Issue with multiple captures & solution

You can't replace multiple values in the same replace statement, since the replacement string is used for everything. i.e. if you did this:

 'dogcatcher' -replace '(?<pet>dog|cat)|(?<singer>cher)','I have a pet ${pet}.  I like ${singer}''s songs.  '

You'd get:

I have a pet dog.  I like 's songs.  I have a pet cat.  I like 's songs.  I have a pet .  I like cher's songs.  

...which is probably not what you're hoping for.

Rather, you'd have to do a match per item:

'dogcatcher' -replace '(?<pet>dog|cat)','I have a pet ${pet}.  ' -replace '(?<singer>cher)', 'I like ${singer}''s songs.  ' 

...to get:

I have a pet dog.  I have a pet cat.  I like cher's songs.  

More Complex Solution

Bringing this back to your scenario, you're not actually using the captured values; rather you're hoping to replace the spaces they were in with new values. For that, you'd simply want this:

$input = 'I''m running Programmer''s Notepad version 2.4.2.1440, and am a big fan.  I also have Chrome v    56.0.2924.87 (64-bit).' 

$version1 = 1
$version2 = 3
$version3 = 5
$version4 = 7

$v1Pattern = '(?<=\bv(?:ersion)?\s+)\d+(?=\.\d+\.\d+\.\d+)'
$v2Pattern = '(?<=\bv(?:ersion)?\s+\d+\.)\d+(?=\.\d+\.\d+)'
$v3Pattern = '(?<=\bv(?:ersion)?\s+\d+\.\d+\.)\d+(?=\.\d+)'
$v4Pattern = '(?<=\bv(?:ersion)?\s+\d+\.\d+\.\d+\.)\d+'

$input -replace $v1Pattern, $version1 -replace $v2Pattern, $version2 -replace $v3Pattern,$version3 -replace $v4Pattern,$version4

Which would give:

I'm running Programmer's Notepad version 1.3.5.7, and am a big fan.  I also have Chrome v    1.3.5.7 (64-bit).

NB: The above could be written as a 1 liner, but I've broken it down to make it simpler to read.

This takes advantage of regex lookarounds; a way of checking the content before and after the string you're capturing, without including those in the match. i.e. so when we select what to replace we can say "match the number that appears after the word version" without saying "replace the word version".

More info on those here: http://www.regular-expressions.info/lookaround.html

Your Example

Adapting the above to work for your example (i.e. where versions may be separated by commas or dots, and there's no consistency to their format beyond being 4 sets of numbers:

$input = @'
#define SOME_MACRO(4, 1, 0, 0)

Version "1.2.3.4"

SomeStruct vs = { 99,99,99,99 }
'@

$version1 = 1
$version2 = 3
$version3 = 5
$version4 = 7

$v1Pattern = '(?<=\b)\d+(?=\s*[\.,]\s*\d+\s*[\.,]\s*\d+\s*[\.,]\s*\d+\b)'
$v2Pattern = '(?<=\b\d+\s*[\.,]\s*)\d+(?=\s*[\.,]\s*\d+\s*[\.,]\s*\d+\b)'
$v3Pattern = '(?<=\b\d+\s*[\.,]\s*\d+\s*[\.,]\s*)\d+(?=\s*[\.,]\s*\d+\b)'
$v4Pattern = '(?<=\b\d+\s*[\.,]\s*\d+\s*[\.,]\s*\d+\s*[\.,]\s*)\d+\b'

$input -replace $v1Pattern, $version1 -replace $v2Pattern, $version2 -replace $v3Pattern,$version3 -replace $v4Pattern,$version4

Gives:

#define SOME_MACRO(1, 3, 5, 7)

Version "1.3.5.7"

SomeStruct vs = { 1,3,5,7 }