Php + Mysql (UTF-8 ) some characters are still bug

holyknight picture holyknight · Nov 7, 2014 · Viewed 10.1k times · Source

Well i got a php script that takes nicknames from a the Steam web-api and insert them into a mysql db. Many of them got rare russian and greek characters. I set php to utf-8 in the php.ini and in all the php files with

mb_internal_encoding('utf-8');

My PDO connector is configured to handle utf8

$connection = new PDO('mysql:host=localhost;dbname=d2bd;mysql:charset=utf8mb4', 'root', '');
$connection->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$connection->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$connection->setAttribute(PDO::ATTR_PERSISTENT, true);
$connection->setAttribute(PDO::MYSQL_ATTR_INIT_COMMAND, "SET NAMES 'utf8mb4' COLLATE 'utf8mb4_unicode_ci'");

my mysql db is properly configured with utf8mb4

character_set_client utf8mb4
character_set_connection utf8mb4
character_set_database utf8mb4
character_set_filesystem binary
character_set_results utf8mb4
character_set_server utf8mb4
character_set_system utf8
character_sets_dir C:\xampp\mysql\share\charsets\
collation_connection utf8mb4_unicode_ci
collation_database utf8mb4_unicode_ci
collation_server utf8mb4_unicode_ci
completion_type NO_CHAIN
concurrent_insert AUTO
connect_timeout 10
core_file OFF

In few words i take the input of the web-api and encode it with uft8_encode(). Then i insert it into the db. The problem is that some characters are not well encoded and when i recall them from the database they are all bugged.

Example 1:

1.Input -> Перуанский чертовски

2.Encode -> ÐеÑÑанÑкий ÑеÑÑовÑки

3.Insert into DB

4.Select from DB -> Ð?еÑ?Ñ?анÑкий Ñ?еÑ?Ñ?овÑкÐ

5.Decode

6.Output -> �?е�?�?анский �?е�?�?овск�

Example 2:

1.Input -> $ |/| 1 ↓_ € ♥ J

2.Encode -> $ |/| 1 â_ ⬠⥠J

3.Insert into DB

4.Select from DB -> 1 â??_ â?¬ â?¥ J

5.Decode

6.Output -> 1 �??_ �?� �?� J

Answer

Rizier123 picture Rizier123 · Nov 7, 2014

Checklist for Problems with character/charset/collation

Including mysql, mysqli, PDO


Content

  1. DISCLAIMER
  2. My insert's in my DB doesn't work properly! What can i do?
  3. Change Charset and Collation of a Database or Table
  4. Set the encoding of your skript files
  5. Set the charset of your page with php or meta tag
  6. What's the difference between UTF8 and UTF8mb4?
  7. Answer to this specific Question
  8. Further Information/Additional Links
  9. Side Notes


1. DISCLAIMER

This Answer should not only answer this question, also should the answer be a bit more extensive, so more people find faster a bundled and good answer!

!Important Notice!
If you change something in your Database always make sur you have a backup of your database! Check it 2 times, or 3!

I'm open for improvements and comments, such as error corrections.
In addition I apologize if the grammar is not perfect: D


If you get stuck on a question like this:

  • Php + Mysql (UTF-8, utf8mb4) some characters are still bug
  • How to convert an entire MySQL database characterset and collation to UTF-8?
  • “Incorrect string value” when trying to insert UTF-8 into MySQL
  • Change MySQL default character set to UTF-8 in my.cnf?
  • Using utf8mb4 with php and mysql
  • PDO + MySQL and broken UTF-8 encoding
  • Error in insertion data in php Mysql
  • PHP PDO: charset, set names?
  • SET NAMES utf8 in MySQL?
  • PHP mysql charset utf8 problems
  • UTF-8 all the way through
  • Manipulating utf8mb4 data from MySQL with PHP
  • ERROR 1115 (42000) : Unknown character set: 'utf8mb4' in mysql

...then my answer maybe helps you!


2. My insert's in my DB doesn't work properly! What can i do?

If your insert's doesn't work properly an your inserted data looks something like this in your database then this could have various reasons!

Examples:

??????????
𫗮𫗮𫗮𫗮
�??_ �?�
â_ ⬠⥠J

Here is a little checklist you can go trought and check if everything is how it should be!
(After the checklist there a few extra informations for mysql, mysqli and PDO)


Checklist:

  • Make sure default character sets is set on tables, client, server & text fields
    • If NOT See Point 3
  • Make sure your database connections character sets
    • IF NOT See Point mysql/PDO
  • Make sure if your displaying data that the charset of the document is set!
    • IF NOT See Point 5
  • Make sure your skript files are saved with the right charset!
    • IF NOT See Point 4
  • Make sure you set your character and your charset!
    • IF NOT See Point mysql/PDO
  • Make sure you forms accept utf8!
    • IF NOT See Point 5
  • Make sure you have set the connection encoding
    • IF NOT See Point mysql/pdo
  • Make sure you have set the servercharacter encoding right
    • IF NOT See Point mysql/pdo
  • ...

  • You have to be sure your using utf8/ utf8mb4 everywhere!


mysql:

-mysql_query("SET NAMES 'utf8'"); Run SET NAMES before every query you use. Because if a mysql driver don't provied mechanismus to charset then you have to use SET NAMES!
-mysql_query("SET CHARACTER SET utf8 "); Set character to utf8
-mysql_set_charset('utf8'); Set your charset to utf8
-mysql API driver doesn't support utf8mb4 (ERROR 1115 (42000))
-character_set_server=utf8 to set server character

PDO:

-$dbh->exec("set names utf8"); If your using PDO you can use this line to SET NAMES
-$dbh = new PDO("mysql:host=$host;dbname=$db;charset=utf8"); This line set the charset but you have to have PHP 5.3.6 or higher
-$dbh->setAttribute(PDO::MYSQL_ATTR_INIT_COMMAND, "SET NAMES 'utf8mb4' COLLATE 'utf8mb4_unicode_ci' "); You can also set SET NAMES with this line
-mb_internal_encoding('UTF-8'); to set the encoding when you use PDO


3. Change Charset and Collation of a Database or Table

If you have to change the charset or collation of a database or table you can use these lines of code:

ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;


4. Set the encoding of your skript files

You may have to check that your skript(php) files are saved with the right charset!

For this i would recommend you Notpad++!

If you have opened your file in notpad go to the menupoint 'Encoding' and change the charset


5. Set the charset of your page with php or meta tag

For displaying data in utf8/utf8mb4 you have to be sure you site is set with the right charset!

You can set the charset in 3 ways like this:

//PHP
ini_set("default_charset", "UTF-8");
header('Content-type: text/html; charset=UTF-8');

//HTML
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Also to accept utf8 in your form use:

<form accept-charset="UTF-8">


6. What's the difference between UTF8 and UTF8mb4?

UTF8:
-utf8 does only support symbols with 3 bytes
-...(many more)

UTF8MB4:
-utf8mb3 does support symbols with 4 bytes
-...(many more)


7. Answer to this specific Question

I think this should work since your using PDO:
(After you created a PDO object! If your using a PHP version less then 5.3.6)

$dbh->exec("set names utf8");

Otherwise try one of these:

ini_set("default_charset", "UTF-8");
header('Content-type: text/html; charset=UTF-8');

UPDATE:

To change the collation or charset of a database or table use this:

ALTER DATABASE databasename CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;


8. Further Information/Additional Links


9. Side Notes

9.1 Error Reporting
If Error's not get displayed use this code snippet:

<?php
    error_reporting(E_ALL);
    ini_set("display_errors", 1);
?>

9.2 Unicode
So that you don't make any mistake you have to really understand utf8!

9.3 One word to mysql, mysqli and PDO
My Personal ranking is:

  1. PDO
  2. mysqli
  3. mysql

I would recommend you to use PDO or mysqli, because the have many benefits against mysql!