How Matcher.find() works

Gunjan Shah picture Gunjan Shah · Jun 23, 2012 · Viewed 55.4k times · Source

I am testing a small stub of Matcher and Pattern class...see the following small stub..

package scjp2.escape.sequence.examples;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Sample_19 {

    public static void main(String a[]){
        String stream = "ab34ef";
        Pattern pattern = Pattern.compile("\\d*");

        //HERE * IS GREEDY QUANTIFIER THAT LOOKS FOR ZERO TO MANY COMBINATION THAT 
        //START WITH NUMBER 
        Matcher matcher = pattern.matcher(stream);

        while(matcher.find()){
            System.out.print(matcher.start()+matcher.group());
        }
    }

}

Here ...our string which we are comparing is "ab34ef". which is of length 6.

Noe let see the iteration...


Iteration NO matcher.start() matcher.group()

1 0 ""

2 1 ""

3 2 34

4 4 ""

5 5 ""

Now ..let combine...matcher.start() + matcher.group().... the output as per our calculation is : 0123445

But, the stub generates 01234456.

I am not able to understand from where the "6" is coming. String index starts from zero and so here there can be maximum index is 5.So from where 6 is coming??

It iterates over the loop six times..How ? Any suggestion ?

Answer

Mark Byers picture Mark Byers · Jun 23, 2012

Your regular expression can match zero characters. The final match is a zero width string occurring at the end of the string, after the character at index 5. The index of this zero width string is therefore 6.


As an aside, you might also find it easier to understand what is going on if you use separators to make the output more readable:

System.out.println(matcher.start()+ ": " + matcher.group());

Results:

0: 
1: 
2: 34
4: 
5: 
6: 

ideone