Manipulating utf8mb4 data from MySQL with PHP

Yhilan picture Yhilan · Oct 23, 2012 · Viewed 10.6k times · Source

This is probably something simple. I swear I've been looking online for the answer and haven't found it. Since my particular case is a little atypical I finally decided to ask here.

I have a few tables in MySQL that I'm using for a Chinese language program. It needs to be able to support every possible Chinese character, including rare ones that don't have great font support. A sample cell in the table might be this:

東菄鶇䍶𠍀倲𩜍𢘐涷蝀凍鯟𢔅崠埬𧓕䰤

In order to get that to work right in the database, I've had to set the encoding/collation to utf8mb4. So far so good. Unfortunately when I pull the same string into PHP, it gets printed as this:

東菄鶇䍶?倲??涷蝀凍鯟?崠埬?䰤

How can I finally kill off the remaining question marks and get them to show as the unicode glyphs they should be? I've got the php page itself using UTF8 encoding in the tag and as a meta tag.

Why can't they communicate with each other? What am I doing wrong?

Answer

deceze picture deceze · Oct 23, 2012

I'd simply guess that you are setting the table to utf8mb4, but your connection encoding is set to utf8. You have to set it to utf8mb4 as well, otherwise MySQL will convert the stored utf8mb4 data to utf8, the latter of which cannot encode "high" Unicode characters. (Yes, that's a MySQL idiosyncrasy.)

On a raw MySQL connection, it will have to look like this:

SET NAMES 'utf8mb4';
SELECT * FROM `my_table`;

You'll have to adapt that to the best way of the client, depending on how you connect to MySQL from PHP (mysql, mysqli or PDO).


To really clarify (yes, using the mysql_ extension for simplicity, don't do that at home):

mysql_connect(...);
mysql_select_db(...);
mysql_set_charset('utf8mb4');     // adapt to your mysql connector of choice

$r = mysql_query('SELECT * FROM `my_table`');

var_dump(mysql_fetch_assoc($r));  // data will be UTF8 encoded