I need return a line preceeding a match on a multi-line string variable.
It seems when using a string variable for the input Select-String considers the entire string as having matched. As such the Context properties are "outside" either end of the string and are null.
Consider the below example:
$teststring = @"
line1
line2
line3
line4
line5
"@
Write-Host "Line Count:" ($teststring | Measure-Object -Line).Lines #verify PowerShell does regard input as a multi-line string (it does)
Select-String -Pattern "line3" -InputObject $teststring -AllMatches -Context 1,0 | % {
$_.Matches.Value #this prints the exact match
$_.Context #output shows all context properties to be empty
$_.Context.PreContext[0] #this would ideally output first line before the match
$_.Context.PreContext[0] -eq $null #but instead is null
}
Am I misunderstanding something here?
What is the best way to return "line2" when matching for "line3"?
Thanks!
Edit: Additional requirements I neglected to state: Needs to provide the line above ALL matched lines for a string of indeterminate length. EG when searching the below for "line3" I need to return "line2" and "line5".
line1
line2
line3
line4
line5
line3
line6
Select-String
operates on arrays of input, so rather than a single, multiline string you must provide an array of lines for -Context
and -AllMatches
to work as intended:
$teststring = @"
line1
line2
line3
line4
line5
line3
line6
"@
$teststring -split '\r?\n' | Select-String -Pattern "line3" -AllMatches -Context 1,0 | % {
"line before: " + $_.Context.PreContext[0]
"matched part: " + $_.Matches.Value # Prints the what the pattern matched
}
This yields:
line before: line2
matched part: line3
line before: line5
matched part: line3
$teststring -split '\r?\n'
splits the multi-line string into an array of lines:
\r?\n
handles either style.Note that it is crucial to use the pipeline to provide Select-String
's input; if you used -InputObject
, the array would be coerced back to a single string.
Select-String
is convenient, but slow.
Especially for a single string already in memory, a solution using the .NET Framework's [Regex]::Matches()
method will perform much better, though it is more complex.
Note that PowerShell's own -match
and -replace
operators are built on the same .NET class, but do not expose all of its functionality; -match
- which does report capture groups in the automatic $Matches
variable - is not an option here, because it only ever returns 1 match.
The following is essentially the same approach as in mjolinor's answer answer, but with several problems corrected[1].
# Note: The sample string is defined so that it contains LF-only (\n)
# line breaks, merely to simplify the regex below for illustration.
# If your script file use LF-only line breaks, the
# `-replace '\r?\n', "`n" call isn't needed.
$teststring = @"
line1
line2
line3
line4
line5
line3
line6
"@ -replace '\r?\n', "`n"
[Regex]::Matches($teststring, '(?:^|(.*)\n).*(line3)') | ForEach-Object {
"line before: " + $_.Groups[1].Value
"matched part: " + $_.Groups[2].Value
}
Regex (?:^|(.*)\n).*(line3)
uses 2 capture groups ((...)
) to capture both the (matching part of) the line to match and the line before ((?:...)
is an auxiliary non-capturing group that is needed for precedence):
(?:^|(.*)\n)
matches either the very start of the string (^
) or (|
) any - possibly empty - sequence of non-newline characters (.*
) followed by a newline (\n
); this ensures that the line to match is also found when there is no preceding line (i.e., of the line to match is the first one).(line3)
is the group defining the line to match; it is preceded by .*
to match the behavior in the question, where pattern line3
is found even it is only part of a line.
(?:^|(.*)\n)(line3)(?:\n|$)
[Regex]::Matches()
finds all matches and returns them as a collection of System.Text.RegularExpressions.Match
objects, which the ForEach-Object
cmdlet call can then operate on to extract the capture-group matches ($_.Groups[<n>].Value
).
[1] As of this writing:
- There is no need to match twice - the enclosing if ($teststring -match $pattern) { ... }
is unnecessary.
- Inline option (?m)
is not needed, because .
does not match newlines by default.
- (.+?)
captures only nonempty lines (and ?
, the non-greedy quantifier, is not needed).
- If the line of interest is the first line - i.e., if there's no line before, it won't be matched.