Perl Regex Multiple Matches

Johann picture Johann · Mar 7, 2013 · Viewed 7.1k times · Source

I'm looking for a regular expression that will behave as follows:

input: "hello world."

output: he, el, ll, lo, wo, or, rl, ld

my idea was something along the lines of

    while($string =~ m/(([a-zA-Z])([a-zA-Z]))/g) {
        print "$1-$2 ";
    }

But that does something a little bit different.

Answer

tchrist picture tchrist · Mar 7, 2013

It's tricky. You have to capture it, save it, and then force a backtrack.

You can do that this way:

use v5.10;   # first release with backtracking control verbs

my $string = "hello, world!";
my @saved;

my $pat = qr{
    ( \pL {2} )
    (?{ push @saved, $^N })
    (*FAIL)
}x;

@saved = ();
$string =~ $pat;
my $count = @saved;
printf "Found %d matches: %s.\n", $count, join(", " => @saved);

produces this:

Found 8 matches: he, el, ll, lo, wo, or, rl, ld.

If you do not have v5.10, or you have a headache, you can use this:

my $string = "hello, world!";
my @pairs = $string =~ m{
  # we can only match at positions where the
  # following sneak-ahead assertion is true:
    (?=                 # zero-width look ahead
        (               # begin stealth capture
            \pL {2}     #       save off two letters
        )               # end stealth capture
    )
  # succeed after matching nothing, force reset
}xg;

my $count = @pairs;
printf "Found %d matches: %s.\n", $count, join(", " => @pairs);

That produces the same output as before.

But you might still have a headache.