Using unicodedata.normalize in Python 2.7

dpitch40 picture dpitch40 · Oct 18, 2012 · Viewed 12.3k times · Source

Once again, I am very confused with a unicode question. I can't figure out how to successfully use unicodedata.normalize to convert non-ASCII characters as expected. For instance, I want to convert the string

u"Cœur"

To

u"Coeur"

I am pretty sure that unicodedata.normalize is the way to do this, but I can't get it to work. It just leaves the string unchanged.

>>> s = u"Cœur"
>>> unicodedata.normalize('NFKD', s) == s
True

What am I doing wrong?

Answer

jfs picture jfs · Oct 18, 2012

You could try Unidecode:

# -*- coding: utf-8 -*-
from unidecode import unidecode # $ pip install unidecode

print(unidecode(u"Cœur"))
# -> Coeur