Python and character normalization

Hellnar picture Hellnar · Nov 12, 2010 · Viewed 9.9k times · Source

Hello I retrieve text based utf8 data from a foreign source which contains special chars such as u"ıöüç" while I want to normalize them to English such as "ıöüç" -> "iouc" . What would be the best way to achieve this ?

Answer

Constantin picture Constantin · Nov 12, 2010

I recommend using Unidecode module:

>>> from unidecode import unidecode
>>> unidecode(u'ıöüç')
'iouc'

Note how you feed it a unicode string and it outputs a byte string. The output is guaranteed to be ASCII.