utf8_bin vs. utf_unicode_ci

user1432124 picture user1432124 · Jun 7, 2012 · Viewed 42.3k times · Source

My table Website

Website_Name//column name
Google
Facebook
Twitter
Orkut
Frype
Skype
Yahoo
Wikipedia

I i use utf8_bin collation then my query to search wikipedia in Website is

Select Website_Name from Website where lower(Website_Name)='wikipedia'

And if i use utf8_unicode_ci then my select query to search wikipedia in Website is

Select Website_Name from Website where Website_Name='wikipedia'

Now I want to know which collation is best depending upon the following queries

Answer

Delan Azabani picture Delan Azabani · Jun 7, 2012

It depends on what you need.

The utf8_bin collation compares strings based purely on their Unicode code point values. If all of the code points have the same values, then the strings are equal. However, this falls apart when you have strings with different composition for combining marks (composed vs. decomposed) or characters that are canonically equivalent but don't have the same code point value. In some cases, using utf8_bin will result in strings not matching when you expect them to. Theoretically, utf8_bin is the fastest because no Unicode normalization is applied to the strings, but it may not be what you want.

utf8_general_ci applies Unicode normalization using language-specific rules and compares strings case-insensitively. utf8_general_cs does the same, but compares strings case-sensitively.