mysqli not setting charset to utf8mb4

Chazy Chaz picture Chazy Chaz · Apr 27, 2016 · Viewed 8.8k times · Source

Problem found

The problem seems to be that $mysqli->set_charset() is not accepting `utf8mb4' as a valid encoding (just as I "speculated" in the first update). MySQL version is 5.5.41 and PHP version is 5.4.41 (no problem with that).


Sorry for the title, I've been searching/reading about what/where can the problem be and I'm already too confused about this...

I recently started using utf8mb4 in mysql. I'm using utf8mb4 as charset and utf8mb4_unicode_ci as collation for all tables/columns.

So I first I changed:

$mysqli->set_charset('utf8');

to

$mysqli->set_charset('utf8mb4');

made sure my php files are utf8 (I'm using Visual Studio Code so the files are created in UTF-8 by default), and php/html headers are set to utf8:

index.php

header('Content-type: Text/HTML; Charset=UTF-8');

main.php (included at the end of index.php)

<meta http-equiv="Content-Type" content="Text/HTML" />
<meta charset="UTF-8" />

The problem is that for some tables I have to manually insert the data, and this data is stored as is: with special characters, with accents, ñ, etc... And when I display this data in my website I can see that these characters have replaced the special/accented characters.

So my question is: is there any way to store data as is (without replacing/converting special/accented characters) in mysql and be able to display it fine (as is)?

If I revert to $mysqli->set_charset('utf8'); the data is displayed fine... So this keeps me wondering that there should be no problem with storing utf-8 characters as they are and there is some codification problem somewhere...

I'm using sqlyog community (with wine) and I read somewhere that sometimes the gui does not work correctly when you change some db/table configuration and the only way is the old way (running yourself the query), but I didn't tried this yet. I ran queries to set the charset and collation of all tables/columns.

What do you think?

UPDATE

I'm starting to think that mysqli does not accept utf8mb4 as a valid character encoding and uses utf8 from php and not from mysql... I also think mysql fckd up creating utf8mb4 instead of updating the existing utf8 to support 4 bytes....

As I'm testing with mysqli charset utf8, everything is stored as is and displayed as is (with mysql charset and collation set to utf8mb4...).

UPDATE 2

SELECT name, HEX(name) FROM person LIMIT 1

This is what it outputs:

New Person has name Altaïr 416C7461C3AF72

But as I already said, this is using:

$mysqli->set_charset('utf8');

to insert and to select. If I use utf8mb4 instead this is what it gets stored:

Altaïr

But it's displayed ok. What it's not displayed ok is if the name is stored as is, the displayed name will be Alta�r.

So the question is: Why is mysqli/mysql storing ï as ï using utf8mb4? And why is php displaying special characters like ï as when utf8mb4 is set in mysqli?

Can someone please confirm that mysqli::set_charset accepts utf8mb4 as a valid encoding?

UPDATE 3

I have a class function that selects a string from a table "es", for example: Iniciar Sesión (this is what's stored) and if mysqli charset is utf8, what is being selected/displayed is Iniciar Sesión.

This maybe a completely different problem but it's clearly another codification problem. From my understanding, if the tables/columns are utf8mb4 and mysqli is set to utf8, mysql has to encode from utf8 (3bytes) to ut8mb4 (full byte support). So this means that mysqli does not use utf8 from php but from mysql. Is this correct, right?

My application is currently having a rough time with encodings... (but maybe is some server configuration problem...)

UPDATE 4

Can the problem be here? I really have no idea about this kind of configurations:

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8               |
| character_set_connection | utf8               |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8               |
| character_set_server     | latin1             |
| character_set_system     | utf8               |
| collation_connection     | utf8_general_ci    |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | latin1_swedish_ci  |
+--------------------------+--------------------+
10 rows in set (0.00 sec)

UPDATE 4-1/2 (copied from a comment)

CREATE TABLE `es` (
    id int(11) NOT NULL AUTO_INCREMENT, 
    name varchar(30) COLLATE utf8mb4_unicode_ci DEFAULT NULL, 
    text varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL, 
    PRIMARY KEY (id), 
    UNIQUE KEY name (name)
) ENGINE=InnoDB AUTO_INCREMENT=76 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci` 

Answer

Vadim picture Vadim · Apr 27, 2016

The problem might stem from the fact that you're not using utf8mb4 in your MySQL column definition (at least you did not say what encoding you're using).

Here is an example of a MySQL table definition with a column that uses utfmb4:

CREATE TABLE `person` (
  `name` varchar(255) CHARACTER SET utf8mb4
)

UPDATE

Using the following table definition:

CREATE TABLE `person` (
  `name` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

and the following PHP script:

<?php
$mysqli = new mysqli('localhost', 'username', 'password', 'database');
$mysqli->set_charset('utf8mb4');

$mysqli->query("INSERT INTO `person` VALUES ('Altaïr Ibn-La\'Ahad')");

$result = $mysqli->query("SELECT * FROM `person` LIMIT 1");

$person = $result->fetch_object();

if($person)
    printf ("New Person has name %s.\n", $person->name);

$result->close();
$mysqli->close();

when I insert "Altaïr Ibn-La'Ahad" into the database, the name is stored as is without changes. The script also prints the name without changes: "New Person has name Altaïr Ibn-La'Ahad."

I hope this helps you resolve your problem. Let me know if it does or doesn't.