From this q/a, I deduced that matching all instances of a given regex not inside quotes, is impossible. That is, it can't match escaped quotes (ex: "this whole \"match\" should be taken"
). If there is a way to do it that I don't know about, that would solve my problem.
If not, however, I'd like to know if there is any efficient alternative that could be used in JavaScript. I've thought about it a bit, but can't come with any elegant solutions that would work in most, if not all, cases.
Specifically, I just need the alternative to work with .split() and .replace() methods, but if it could be more generalized, that would be the best.
For Example:
An input string of:
+bar+baz"not+or\"+or+\"this+"foo+bar+
replacing + with #, not inside quotes, would return:
#bar#baz"not+or\"+or+\"this+"foo#bar#
Actually, you can match all instances of a regex not inside quotes for any string, where each opening quote is closed again. Say, as in you example above, you want to match \+
.
The key observation here is, that a word is outside quotes if there are an even number of quotes following it. This can be modeled as a look-ahead assertion:
\+(?=([^"]*"[^"]*")*[^"]*$)
Now, you'd like to not count escaped quotes. This gets a little more complicated. Instead of [^"]*
, which advanced to the next quote, you need to consider backslashes as well and use [^"\\]*
. After you arrive at either a backslash or a quote, you need to ignore the next character if you encounter a backslash, or else advance to the next unescaped quote. That looks like (\\.|"([^"\\]*\\.)*[^"\\]*")
. Combined, you arrive at
\+(?=([^"\\]*(\\.|"([^"\\]*\\.)*[^"\\]*"))*[^"]*$)
I admit it is a little cryptic. =)