How do I decode a string with escaped unicode?

styfle picture styfle · Oct 25, 2011 · Viewed 145.3k times · Source

I'm not sure what this is called so I'm having trouble searching for it. How can I decode a string with unicode from http\u00253A\u00252F\u00252Fexample.com to http://example.com with JavaScript? I tried unescape, decodeURI, and decodeURIComponent so I guess the only thing left is string replace.

EDIT: The string is not typed, but rather a substring from another piece of code. So to solve the problem you have to start with something like this:

var s = 'http\\u00253A\\u00252F\\u00252Fexample.com';

I hope that shows why unescape() doesn't work.

Answer

Ioannis Karadimas picture Ioannis Karadimas · Oct 25, 2011

UPDATE: Please note that this is a solution that should apply to older browsers or non-browser platforms, and is kept alive for instructional purposes. Please refer to @radicand 's answer below for a more up to date answer.


This is a unicode, escaped string. First the string was escaped, then encoded with unicode. To convert back to normal:

var x = "http\\u00253A\\u00252F\\u00252Fexample.com";
var r = /\\u([\d\w]{4})/gi;
x = x.replace(r, function (match, grp) {
    return String.fromCharCode(parseInt(grp, 16)); } );
console.log(x);  // http%3A%2F%2Fexample.com
x = unescape(x);
console.log(x);  // http://example.com

To explain: I use a regular expression to look for \u0025. However, since I need only a part of this string for my replace operation, I use parentheses to isolate the part I'm going to reuse, 0025. This isolated part is called a group.

The gi part at the end of the expression denotes it should match all instances in the string, not just the first one, and that the matching should be case insensitive. This might look unnecessary given the example, but it adds versatility.

Now, to convert from one string to the next, I need to execute some steps on each group of each match, and I can't do that by simply transforming the string. Helpfully, the String.replace operation can accept a function, which will be executed for each match. The return of that function will replace the match itself in the string.

I use the second parameter this function accepts, which is the group I need to use, and transform it to the equivalent utf-8 sequence, then use the built - in unescape function to decode the string to its proper form.