Possible Duplicate:
What's the difference between utf8_general_ci and utf8_unicode_ci
I've got two options for unicode that look promising for a mysql database.
utf8_general_ci unicode (multilingual), case-insensitive
utf8_unicode_ci unicode (multilingual), case-insensitive
Can you please explain what is the difference between utf8_general_ci and utf8_unicode_ci? What are the effects of choosing one over the other when designing a database?
utf8_general_ci
is a very simple — and on Unicode, very broken — collation, one that gives incorrect results on general Unicode text. What it does is:
This does not work correctly on Unicode, because it does not understand Unicode casing. Unicode casing alone is much more complicated than an ASCII-minded approach can handle. For example:
There are many other subtleties.
utf8_unicode_ci
uses the standard Unicode Collation Algorithm, supports so called expansions and ligatures, for example:
German letter ß (U+00DF LETTER SHARP S) is sorted near "ss"
Letter Œ (U+0152 LATIN CAPITAL LIGATURE OE) is sorted near "OE". utf8_general_ci
does not support expansions/ligatures, it sorts
all these letters as single characters, and sometimes in a wrong order.
utf8_unicode_ci
is generally more accurate for all scripts.
For example, on Cyrillic block:
utf8_unicode_ci
is fine for all these languages:
Russian, Bulgarian, Belarusian, Macedonian, Serbian, and Ukrainian.
While utf8_general_ci is fine only for Russian and Bulgarian subset of Cyrillic.
Extra letters used in Belarusian, Macedonian, Serbian, and Ukrainian
are sorted not well. The cost of utf8_unicode_ci
is that it is a little bit
slower than utf8_general_ci
. But that’s the price you pay for correctness. Either you can have a fast answer that’s wrong, or a very slightly slower answer that’s right. Your choice.
It is very difficult to ever justify giving wrong answers, so it’s best to assume that utf8_general_ci
doesn’t exist and to always use utf8_unicode_ci
. Well, unless you want wrong answers.
Source: http://forums.mysql.com/read.php?103,187048,188748#msg-188748