RegExp - How can I match the shortest amount possible?

Aust picture Aust · Aug 29, 2012 · Viewed 10k times · Source

I have my Regular Expression /'(.*)(?:(?:'\s*,\s*)|(?:'\)))/
and my test code ('He said, "You're cool."' , 'Rawr')
(My test code simulates parameters being passed into a function.)

I will explain my Regular Expression as I understand it and hopefully a few of you can shed some light on my problem.

1)/' means at the beginning of the matched string, there needs to be '

2)(.*) means capture any character except \n 0 or more times

3)(?:(?:4)|(?:5)) means don't capture but try to do step 4 and if it doesn't work try step 5

4)(?:'\s*,\s*) means don't capture but there needs to be a ' with 0 or more whitespace characters followed by a , with 0 or more whitespace characters

5)(?:'\)) means don't capture but there needs to be ')

So it seems that it should return this (and this is what I want):
'+He said, "You're cool."+' ,
But it returns:
'+He said, "You're cool."' , 'Rawr+')

If I change my test code to ('He said, "You're cool."' , 'Rawr' (no end parenthesis) it returns what I want, but as soon as I add that last parenthesis, then it seems that my OR operator does whatever it wants to. I want it to test first if there is a comma, and break there if there is one, and if there is not one check for a parenthesis.

I've tried switching the spots of step 4 and step 5, but still the OR operator seems to always default to the (?:'\)) side. How can I match the shortest amount possible?

Answer

Bergi picture Bergi · Aug 29, 2012

I don't think your problem is the OR operator, but the greediness of the .*. It will match your full string, and then back-track until the following expressions match. The first match in this backtracking process will be 'He said, "You're cool."' , 'Rawr+'). Try .*? instead!