Print Unicode characters PHP

Cameron Tinker picture Cameron Tinker · Jul 9, 2013 · Viewed 33.9k times · Source

I have a database which stores video game names with Unicode characters but I can't figure out how to properly escape these Unicode characters when printing them to an HTML response.

For instance, when I print all games with the name like Uncharted, I get this:

Uncharted: Drake's Fortuneâ„¢
Uncharted 2: Among Thievesâ„¢
Uncharted 3: Drake's Deceptionâ„¢

but it should display this:

Uncharted: Drake's Fortune™
Uncharted 2: Among Thieves™
Uncharted 3: Drake's Deception™

I ran a quick JavaScript escape function to see which Unicode character the is and found that it's \u2122.

I don't have a problem fully escaping every character in the string if I can get the character to display correctly. My guess is to somehow find the hex representation of each character in the string and have PHP render the Unicode characters like this:

print "&#x2122";

Please guide me through the best approach for Unicode escaping a string for being HTML friendly. I've done something similar for JavaScript a while back, but JavaScript has a built in function for escape and unescape.

I'm not aware of any PHP functions of similar functionality however. I have read about the ord function, but it just returns the ASCII character code for a given character, hence the improper display of the ™ or the ™. I would like this function to be versatile enough to apply to any string containing valid Unicode characters.

Answer

Alex Shesterov picture Alex Shesterov · Jul 9, 2013

It looks like you have UTF-8 encoded strings internally, PHP outputs them properly, but your browser fails to auto-detect the encoding (it decides for ISO 8859-1 or some other encoding).

The best way is to tell the browser that UTF-8 is being used by sending the corresponding HTTP header:

header("content-type: text/html; charset=UTF-8");  

Then, you can leave the rest of your code as-is and don't have to html-encode entities or create other mess.

If you want, you can additionally declare the encoding in the generated HTML by using the <meta> tag:

  • <meta http-equiv=Content-Type content="text/html; charset=UTF-8"> for HTML <=4.01
  • <meta charset="UTF-8"> for HTML5

HTTP header has priority over the <meta> tag, but the latter may be useful if the HTML is saved to HD and then read locally.