URL Escaping Chinese/Japanese Unicode Characters for Internet Explorer

Bear picture Bear · Nov 25, 2009 · Viewed 9.6k times · Source

I'm trying to URL-escape (percent-encode) non-ascii characters in several URLs I'm dealing with. I'm working with a flash application that loads resources like images and sound clips from these URLs. Since the filenames can contain non-ascii characters, like so: 日本語.jpg I escape them by utf-8 encoding the characters, and then percent-escaping the unicode bytes, to get the following:

%E6%97%A5%E6%9C%AC%E8%AA%9E.jpg

These filenames work fine when I run the app in any browser other than Internet Explorer - I've tried Firefox, Safari and Chrome. But when I launch the app in IE (tried both 6 and 8) and it tries to load the sound clip, I get: Error #2044: Unhandled ioError, and the URL has been corrupted to something like:

æ¥æ¬èª.jpg

Any thoughts on how to fix this? This is just test-driving the flash app with local filesystem URLs. I've also noticed that Internet explorer isn't able to locate a file such as: file:///C:/%E6%97%A5%E6%9C%AC%E8%AA%9E.jpg, though Chrome / Firefox will decode it and load just fine for a file with the path

C:\日本語.jpg

edit

I think my problem is the same as the one encountered in the following ActionScript code fragment:

import flash.display.Loader;
import flash.net.URLRequest;
...
var ldr:Loader;
var req:URLRequest = new URLRequest("日本語.jpg");
ldr = new Loader();
ldr.load(req);

Using the string 日本語.jpg will work in IE, while using the string %E6%97%A5%E6%9C%AC%E8%AA%9E.jpg works in other browsers. What I need is a single form that will work in all browsers. I have tried the %u encoding and setting the http request header to Content-Type: text/html; charset=utf-8 with no luck in either percent-escaped or unescaped form.

Answer

Dave Mateer picture Dave Mateer · Nov 25, 2009

Sorry, no solution, but maybe at least some more information about what might be going on here. (Probably you've already figured this much out, but maybe it will help another reader find a solution.) The "official" url encoding specification seems to leave the door wide open as to how to decode escaped urls like the ones you are generating--are the escaped entities intended to represent UTF-8 characters (as Firefox, etc. are interpretting them) or ASCII characters (as IE is interpretting them)? I don't know of any way to force the intended decoding strategy.

Just a question: what bad thing is happening if you do not escape them at all, but leave the unicode in the url? Although I don't have a lot of experience with it, I thought I remember reading somewhere that the days of needing to escape unicode in urls are behind us. Could be wrong about that...