I am new to multilingual data and my confession is that I never did tried it before. Currently I am working on a multilingual site, but I do not know which language will be used.
Which collation/character set of MySQL should I use to achieve this?
Should I use some Unicode type of character set?
And of course these languages are not out of this universe, these must be in the set which we mostly use.
You should use a Unicode collation. You can set it by default on your system, or on each field of your tables. There are the following Unicode collation names, and this are their differences:
utf8_general_ci is a very simple collation. It just - removes all accents - then converts to upper case and uses the code of this sort of "base letter" result letter to compare.
utf8_unicode_ci uses the default Unicode collation element table.
The main differences are:
utf8_general_ci does not support expansions/ligatures, it sorts all these letters as single characters, and sometimes in the wrong order.
+/- The disadvantage of utf8_unicode_ci is that it is a little bit slower than utf8_general_ci.
So depending on, if you know or not, which specific languages/characters you are going to use I do recommend that you use utf8_unicode_ci which has a more ample coverage.
Extracted from MySQL forums.