Regex to replace single backslashes, excluding those followed by certain chars

WastedSpace picture WastedSpace · Feb 22, 2011 · Viewed 8.4k times · Source

I have a regex expression which removes any backslashes from a string if not followed by one of these characters: \ / or }.

It should turn this string:

foo\bar\\batz\/hi

Into this:

foobar\\batz\/hi

But the problem is that it is dealing with each backslash as it goes along. So it follows the rule in that it removes that first backslash, and ignores the 2nd one because it is followed by another backslash. But when it gets to the 3rd one, it removes it, because it isn't followed by another.

My current code looks like this: str.replace(/\\(?!\\|\/|\})/g,"")

But the resulting string looks like this: foobar\batz\/hi

How do I get it to skip the 3rd backslash? Or is it a case of doing some sort of explicit negative search & replace type thing? Eg. replace '\', but don't replace '\\', '\/' or '\}'?

Please help! :)

EDIT

Sorry, I should have explained - I am using javascript, so I don't think I can do negative lookbehinds...

Answer

Bart Kiers picture Bart Kiers · Feb 22, 2011

You need to watch out for an escaped backslash, followed by a single backslash. Or better: an uneven number of successive backslashes. In that case, you need to keep the even number of backslashes intact, and only replace the last one (if not followed by a / or {).

You can do that with the following regex:

(?<!\\)(?:((\\\\)*)\\)(?![\\/{])

and replace it with:

$1

where the first match group is the first even number of backslashes that were matched.

A short explanation:

(?<!\\)          # looking behind, there can't be a '\'
(?:((\\\\)*)\\)  # match an uneven number of backslashes and store the even number in group 1
(?![\\/{])       # looking ahead, there can't be a '\', '/' or '{'

In plain ENglish that would read:

match an uneven number of back-slashes, (?:((\\\\)*)\\), not followed by \\ or { or /, (?![\\/{]), and not preceded by a backslash (?<!\\).

A demo in Java (remember that the backslashes are double escaped!):

String s = "baz\\\\\\foo\\bar\\\\batz\\/hi";
System.out.println(s);
System.out.println(s.replaceAll("(?<!\\\\)(?:((\\\\\\\\)*)\\\\)(?![\\\\/{])", "$1"));

which will print:

baz\\\foo\bar\\batz\/hi
baz\\foobar\\batz\/hi

EDIT

And a solution that does not need look-behinds would look like:

([^\\])((\\\\)*)\\(?![\\/{])

and is replaced by:

$1$2

where $1 is the non-backslash char at the start, and $2 is the even (or zero) number of backslashes following that non-backslash char.