I discovered (the hard way) that MySQL's UTF8 character set is only 3 bytes. A bit of research shows I can fix this by changing the tables to utilize the utf8mb4
collation and get the full 4 bytes UTF should be.
I've done so. My database, tables and columns have all been ALTER
ed to utilize this charset. However, I still receive this message if I have data that has unicode code points larger than U+FFFF:
Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='"
I discovered I have the following settings:
> show variables like '%collation%';
collation_connection utf8_general_ci
collation_database utf8mb4_general_ci
collation_server utf8mb4_general_ci
The collation_server
was set by making changes to my.cnf
. My question, is how do I change the connection one? I currently connect to the database using SQL Alchemy and pymysql like this:
connect_string = 'mysql+pymysql://{}:{}@{}:{}/{}?charset=utf8'.format(DB_USER, DB_PASS, DB_HOST, DB_PORT, DATABASE)
engine = create_engine(connect_string, convert_unicode=True, echo=False)
session = sessionmaker()
session.configure(bind=engine)
What can I do to change from utf8_general_ci
to utf8mb4_general_ci
when connecting via SQL Alchemy?
Change the connect_string
to use charset=utf8mb4
:
connect_string = 'mysql+pymysql://{}:{}@{}:{}/{}?charset=utf8mb4'.format(DB_USER, DB_PASS, DB_HOST, DB_PORT, DATABASE)