Replacing double backslashes with single backslash

Vinay thallam picture Vinay thallam · Jun 13, 2012 · Viewed 20.2k times · Source

I have a string "\\u003c", which belongs to UTF-8 charset. I am unable to decode it to unicode because of the presence of double backslashes. How do i get "\u003c" from "\\u003c"? I am using java.

I tried with,

myString.replace("\\\\", "\\");

but could not achieve what i wanted.

This is my code,

String myString = FileUtils.readFileToString(file);
String a = myString.replace("\\\\", "\\");
byte[] utf8 = a.getBytes();

// Convert from UTF-8 to Unicode
a = new String(utf8, "UTF-8");
System.out.println("Converted string is:"+a);

and content of the file is

\u003c

Answer

mtyson picture mtyson · Jun 19, 2016

You can use String#replaceAll:

String str = "\\\\u003c";
str= str.replaceAll("\\\\\\\\", "\\\\");
System.out.println(str);

It looks weird because the first argument is a string defining a regular expression, and \ is a special character both in string literals and in regular expressions. To actually put a \ in our search string, we need to escape it (\\) in the literal. But to actually put a \ in the regular expression, we have to escape it at the regular expression level as well. So to literally get \\ in a string, we need write \\\\ in the string literal; and to get two literal \\ to the regular expression engine, we need to escape those as well, so we end up with \\\\\\\\. That is:

String Literal        String                      Meaning to Regex
−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−
\                     Escape the next character   Would depend on next char
\\                    \                           Escape the next character
\\\\                  \\                          Literal \
\\\\\\\\              \\\\                        Literal \\

In the replacement parameter, even though it's not a regex, it still treats \ and $ specially — and so we have to escape them in the replacement as well. So to get one backslash in the replacement, we need four in that string literal.