UTF-8 characters don't display correctly

Theo Smeets picture Theo Smeets · Apr 20, 2011 · Viewed 24.4k times · Source

This is my PHP code:

<?php
$result = '';
$str = 'Тугайный соловей';
for ($y=0; $y < strlen($str); $y++) {
    $tmp = mb_substr($str, $y, 1);
    $result = $result . $tmp;
}
echo 'result = ' . $result;

The output is:

Тугайный Ñоловей

What can I do? I have to put $result into a MySQL database.

Answer

Capsule picture Capsule · Apr 20, 2011

What's the encoding of your file? It should be UTF8 too. What's the default charset of your http server? It should be UTF-8 as well.

Encoding only works if:

  • the file is encoded correctly
  • the server tells what's the encoding of the delivered file.

When working with databases, you also have to set the right encoding for your DB fields and the way the MySQL client communicates with the server (see mysql_set_charset()). Fields only are not enough because your MySQL client (in this case, PHP) could be set to ISO by default and reinterprets the data. So you end up with UTF8 DB -> ISO client -> injected into UTF8 PHP script. No wonder why it's messed up at the end :-)

How to serve the file with the right charset?

header('Content-type: text/html; charset=utf-8') is one solution

.htaccess file containing AddDefaultCharset UTF-8 is another one

HTML meta content-type might work too but it's always better to send this information using HTTP headers.

PS: you also have to use mb_strlen() because strlen() on UTF8 strings will probably report more than the real length.