I have a database which stores video game names with Unicode characters but I can't figure out how to properly escape these Unicode characters when printing them to an HTML response.
For instance, when I print all games with the name like Uncharted, I get this:
Uncharted: Drake's Fortuneâ„¢
Uncharted 2: Among Thievesâ„¢
Uncharted 3: Drake's Deceptionâ„¢
but it should display this:
Uncharted: Drake's Fortune™
Uncharted 2: Among Thieves™
Uncharted 3: Drake's Deception™
I ran a quick JavaScript escape function to see which Unicode character the ™
is and found that it's \u2122
.
I don't have a problem fully escaping every character in the string if I can get the ™
character to display correctly. My guess is to somehow find the hex representation of each character in the string and have PHP render the Unicode characters like this:
print "™";
Please guide me through the best approach for Unicode escaping a string for being HTML friendly. I've done something similar for JavaScript a while back, but JavaScript has a built in function for escape and unescape.
I'm not aware of any PHP functions of similar functionality however. I have read about the ord function, but it just returns the ASCII character code for a given character, hence the improper display of the ™
or the ™
. I would like this function to be versatile enough to apply to any string containing valid Unicode characters.
It looks like you have UTF-8 encoded strings internally, PHP outputs them properly, but your browser fails to auto-detect the encoding (it decides for ISO 8859-1 or some other encoding).
The best way is to tell the browser that UTF-8 is being used by sending the corresponding HTTP header:
header("content-type: text/html; charset=UTF-8");
Then, you can leave the rest of your code as-is and don't have to html-encode entities or create other mess.
If you want, you can additionally declare the encoding in the generated HTML by using the <meta>
tag:
<meta http-equiv=Content-Type content="text/html; charset=UTF-8">
for HTML <=4.01 <meta charset="UTF-8">
for HTML5 HTTP header has priority over the <meta>
tag, but the latter may be useful if the HTML is saved to HD and then read locally.