I'm wondering if there is a "best" choice for collation in MySQL for a general website where you aren't 100% sure of what will be entered? I understand that all the encodings should be the same, such as MySQL, Apache, the HTML and anything inside PHP.
In the past I have set PHP to output in "UTF-8", but which collation does this match in MySQL? I'm thinking it's one of the UTF-8 ones, but I have used utf8_unicode_ci
, utf8_general_ci
, and utf8_bin
before.
The main difference is sorting accuracy (when comparing characters in the language) and performance. The only special one is utf8_bin which is for comparing characters in binary format.
utf8_general_ci
is somewhat faster than utf8_unicode_ci
, but less accurate (for sorting). The specific language utf8 encoding (such as utf8_swedish_ci
) contain additional language rules that make them the most accurate to sort for those languages. Most of the time I use utf8_unicode_ci
(I prefer accuracy to small performance improvements), unless I have a good reason to prefer a specific language.
You can read more on specific unicode character sets on the MySQL manual - http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html