Using SQLAlchemy and pymysql, how can I set the connection to utilize utf8mb4?

Andy picture Andy · Jul 8, 2015 · Viewed 17.5k times · Source

I discovered (the hard way) that MySQL's UTF8 character set is only 3 bytes. A bit of research shows I can fix this by changing the tables to utilize the utf8mb4 collation and get the full 4 bytes UTF should be.

I've done so. My database, tables and columns have all been ALTERed to utilize this charset. However, I still receive this message if I have data that has unicode code points larger than U+FFFF:

Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='"

I discovered I have the following settings:

> show variables like '%collation%';

collation_connection  utf8_general_ci
collation_database    utf8mb4_general_ci
collation_server      utf8mb4_general_ci

The collation_server was set by making changes to my.cnf. My question, is how do I change the connection one? I currently connect to the database using SQL Alchemy and pymysql like this:

connect_string = 'mysql+pymysql://{}:{}@{}:{}/{}?charset=utf8'.format(DB_USER, DB_PASS, DB_HOST, DB_PORT, DATABASE)
engine = create_engine(connect_string, convert_unicode=True, echo=False)
session = sessionmaker()
session.configure(bind=engine)

What can I do to change from utf8_general_ci to utf8mb4_general_ci when connecting via SQL Alchemy?

Answer

doru picture doru · Jul 8, 2015

Change the connect_string to use charset=utf8mb4:

connect_string = 'mysql+pymysql://{}:{}@{}:{}/{}?charset=utf8mb4'.format(DB_USER, DB_PASS, DB_HOST, DB_PORT, DATABASE)