I have my Regular Expression /'(.*)(?:(?:'\s*,\s*)|(?:'\)))/
and my test code ('He said, "You're cool."' , 'Rawr')
(My test code simulates parameters being passed into a function.)
I will explain my Regular Expression as I understand it and hopefully a few of you can shed some light on my problem.
1)/'
means at the beginning of the matched string, there needs to be '
2)(.*)
means capture any character except \n
0 or more times
3)(?:(?:4)|(?:5))
means don't capture but try to do step 4 and if it doesn't work try step 5
4)(?:'\s*,\s*)
means don't capture but there needs to be a '
with 0 or more whitespace characters followed by a ,
with 0 or more whitespace characters
5)(?:'\))
means don't capture but there needs to be ')
So it seems that it should return this (and this is what I want):
'
+He said, "You're cool."
+' ,
But it returns:
'
+He said, "You're cool."' , 'Rawr
+')
If I change my test code to ('He said, "You're cool."' , 'Rawr'
(no end parenthesis) it returns what I want, but as soon as I add that last parenthesis, then it seems that my OR operator does whatever it wants to. I want it to test first if there is a comma, and break there if there is one, and if there is not one check for a parenthesis.
I've tried switching the spots of step 4 and step 5, but still the OR operator seems to always default to the (?:'\))
side.
How can I match the shortest amount possible?
I don't think your problem is the OR operator, but the greediness of the .*
. It will match your full string, and then back-track until the following expressions match. The first match in this backtracking process will be 'He said, "You're cool."' , 'Rawr
+')
. Try .*?
instead!