Regex to match all instances not inside quotes

Azmisov picture Azmisov · Jun 24, 2011 · Viewed 39.6k times · Source

From this q/a, I deduced that matching all instances of a given regex not inside quotes, is impossible. That is, it can't match escaped quotes (ex: "this whole \"match\" should be taken"). If there is a way to do it that I don't know about, that would solve my problem.

If not, however, I'd like to know if there is any efficient alternative that could be used in JavaScript. I've thought about it a bit, but can't come with any elegant solutions that would work in most, if not all, cases.

Specifically, I just need the alternative to work with .split() and .replace() methods, but if it could be more generalized, that would be the best.

For Example:
An input string of:
+bar+baz"not+or\"+or+\"this+"foo+bar+
replacing + with #, not inside quotes, would return:
#bar#baz"not+or\"+or+\"this+"foo#bar#

Answer

Jens picture Jens · Jun 24, 2011

Actually, you can match all instances of a regex not inside quotes for any string, where each opening quote is closed again. Say, as in you example above, you want to match \+.

The key observation here is, that a word is outside quotes if there are an even number of quotes following it. This can be modeled as a look-ahead assertion:

\+(?=([^"]*"[^"]*")*[^"]*$)

Now, you'd like to not count escaped quotes. This gets a little more complicated. Instead of [^"]* , which advanced to the next quote, you need to consider backslashes as well and use [^"\\]*. After you arrive at either a backslash or a quote, you need to ignore the next character if you encounter a backslash, or else advance to the next unescaped quote. That looks like (\\.|"([^"\\]*\\.)*[^"\\]*"). Combined, you arrive at

\+(?=([^"\\]*(\\.|"([^"\\]*\\.)*[^"\\]*"))*[^"]*$)

I admit it is a little cryptic. =)