Replace html entities with the corresponding utf-8 characters in Python 2.6

Alexandru picture Alexandru · Apr 8, 2009 · Viewed 16.9k times · Source

I have a html text like this:

<xml ... >

and I want to convert it to something readable:

<xml ...>

Any easy (and fast) way to do it in Python?

Answer

vartec picture vartec · Apr 8, 2009

Python 2.7

Official documentation for HTMLParser: Python 2.7

>>> import HTMLParser
>>> pars = HTMLParser.HTMLParser()
>>> pars.unescape('&copy; &euro;')
u'\xa9 \u20ac'
>>> print _
© €

Python 3

Official documentation for HTMLParser: Python 3

>>> from html.parser import HTMLParser
>>> pars = HTMLParser()
>>> pars.unescape('&copy; &euro;')
© €