MySQL : strange LENGTH() behaviour on utf8 string

Alain Tiemblo picture Alain Tiemblo · Apr 29, 2013 · Viewed 8.6k times · Source

I am doing unit tests on requests generators, and I get in trouble with LENGTH function.

I have 2 requests that follows each other :

SHOW VARIABLES LIKE '%character%'

Returns the following result :

array(8) {
  [0] =>
  array(2) {
    'Variable_name' =>
    string(20) "character_set_client"
    'Value' =>
    string(4) "utf8"
  }
  [1] =>
  array(2) {
    'Variable_name' =>
    string(24) "character_set_connection"
    'Value' =>
    string(4) "utf8"
  }
  [2] =>
  array(2) {
    'Variable_name' =>
    string(22) "character_set_database"
    'Value' =>
    string(6) "latin1"
  }
  [3] =>
  array(2) {
    'Variable_name' =>
    string(24) "character_set_filesystem"
    'Value' =>
    string(6) "binary"
  }
  [4] =>
  array(2) {
    'Variable_name' =>
    string(21) "character_set_results"
    'Value' =>
    string(4) "utf8"
  }
  [5] =>
  array(2) {
    'Variable_name' =>
    string(20) "character_set_server"
    'Value' =>
    string(4) "utf8"
  }
  [6] =>
  array(2) {
    'Variable_name' =>
    string(20) "character_set_system"
    'Value' =>
    string(4) "utf8"
  }
  [7] =>
  array(2) {
    'Variable_name' =>
    string(18) "character_sets_dir"
    'Value' =>
    string(26) "/usr/share/mysql/charsets/"
  }
}

My second request is :

SELECT LENGTH('重庆') as len

It returns 6 instead of 2.

What's wrong here ? My charset parameters looks good.

Answer

Alain Tiemblo picture Alain Tiemblo · Apr 29, 2013

I found my answer in the MySQL documentation :

The LENGTH function counts bytes :

mysql> SELECT LENGTH('重庆') ;
+------------------+
| LENGTH('重庆')   |
+------------------+
|                6 |
+------------------+
1 row in set (0.00 sec)

The CHAR_LENGTH function counts characters :

mysql> SELECT CHAR_LENGTH('重庆') ;
+-----------------------+
| CHAR_LENGTH('重庆')   |
+-----------------------+
|                     2 |
+-----------------------+
1 row in set (0.00 sec)