Our column is currently collated to latin1_swedish_ci
and special unicode characters are, obviously, getting stripped out. We want to be able to accept chars such as U+272A ✪
, U+2764 ❤
, (see this wikipedia article) etc. I'm leaning towards utf8_unicode_ci
, would this collation handle these and other characters? I don't care about speed as this column isn't an index.
MySQL Version: 5.5.28-1
The collation is the least of your worries, what you need to think about is the character set for the column/table/database. The collation (rules governing how data is compared and sorted) is just a corollary of that.
MySQL supports several Unicode character sets, utf8
and utf8mb4
being the most interesting. utf8
supports Unicode characters in the BMP, i.e. a subset of all of Unicode. utf8mb4
, available since MySQL 5.5.3, supports all of Unicode.
The collation to be used with any of the Unicode encodings is most likely xxx_general_ci
or xxx_unicode_ci
. The former is a general sorting and comparison algorithm independent of language, the latter is a more complete language independent algorithm supporting more Unicode features (e.g. treating "ß" and "ss" as equivalent), but is therefore also slower.
See https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-sets.html.