I have a string "\\u003c", which belongs to UTF-8 charset. I am unable to decode it to unicode because of the presence of double backslashes. How do i get "\u003c" from "\\u003c"? I am using java.
I tried with,
myString.replace("\\\\", "\\");
but could not achieve what i wanted.
This is my code,
String myString = FileUtils.readFileToString(file);
String a = myString.replace("\\\\", "\\");
byte[] utf8 = a.getBytes();
// Convert from UTF-8 to Unicode
a = new String(utf8, "UTF-8");
System.out.println("Converted string is:"+a);
and content of the file is
\u003c
You can use String#replaceAll
:
String str = "\\\\u003c";
str= str.replaceAll("\\\\\\\\", "\\\\");
System.out.println(str);
It looks weird because the first argument is a string defining a regular expression, and \
is a special character both in string literals and in regular expressions. To actually put a \
in our search string, we need to escape it (\\
) in the literal. But to actually put a \
in the regular expression, we have to escape it at the regular expression level as well. So to literally get \\
in a string, we need write \\\\
in the string literal; and to get two literal \\
to the regular expression engine, we need to escape those as well, so we end up with \\\\\\\\
. That is:
String Literal String Meaning to Regex −−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−− \ Escape the next character Would depend on next char \\ \ Escape the next character \\\\ \\ Literal \ \\\\\\\\ \\\\ Literal \\
In the replacement parameter, even though it's not a regex, it still treats \
and $
specially — and so we have to escape them in the replacement as well. So to get one backslash in the replacement, we need four in that string literal.