In answering another question I became aware that my Javascript/DOM knowledge had become a bit out of date in that I am still using escape
/unescape
to encode the contents of URL components whereas it appears I should now be using encodeURIComponent
/decodeURIComponent
instead.
What I want to know is what is wrong with escape
/unescape
? There are some vague suggestions that there is some sort of problem around Unicode characters, but I can't find any definite explanation.
My web experience is fairly biased, almost all of it has been writing big Intranet apps tied to Internet Explorer. That has involved a lot of use of escape
/unescape
and the apps involved have fully supported Unicode for many years now.
So what are the Unicode problems that escape
/unescape
are supposed to have ? Does anyone have any test cases to demonstrate the problems ?
What I want to know is what is wrong with escape/unescape ?
They're not “wrong” as such, they're just their own special string format which looks a bit like URI-parameter-encoding but actually isn't. In particular:
So if you use escape() to create URI parameter values you will get the wrong results for strings containing a plus, or any non-ASCII characters.
escape() could be used as an internal JavaScript-only encoding scheme, for example to escape cookie values. However now that all browsers support encodeURIComponent (which wasn't originally the case), there's no reason to use escape in preference to that.
There is only one modern use for escape/unescape that I know of, and that's as a quick way to implement a UTF-8 encoder/decoder, by leveraging the UTF-8 processing in URIComponent handling:
utf8bytes= unescape(encodeURIComponent(unicodecharacters));
unicodecharacters= decodeURIComponent(escape(utf8bytes));