How to remove accent in Python 3.5 and get a string with unicodedata or other solutions?

Sulot picture Sulot · Oct 25, 2015 · Viewed 15.4k times · Source

I am trying to get a string to use in google geocoding api.I ve checked a lot of threads but I am still facing problem and I don't understand how to solve it.

I need addresse1 to be a string without any special characters. Addresse1 is for example: "32 rue d'Athènes Paris France".

addresse1= collect.replace(' ','+').replace('\n','') 
addresse1=unicodedata.normalize('NFKD', addresse1).encode('utf-8','ignore') 

here I got a string without any accent... Ho no... It is not a string but a bytes. So I ve done what was suggested and 'decode:

addresse1=addresse1.decode('utf-8') 

But then addresse1 is exactly the same than at the begining... What do I have to do? What am I doing wrong? Or what i don't understand with unicode? Or is there a better solution?

Thanks,

Stéphane.

Answer

Ignacio Vazquez-Abrams picture Ignacio Vazquez-Abrams · Oct 25, 2015

with 3rd party package: unidecode

3>> unidecode.unidecode("32 rue d'Athènes Paris France")
"32 rue d'Athenes Paris France"