In Python 2.7, how do you convert a latin1 string to UTF-8.
For example, I'm trying to convert é to utf-8.
>>> "é"
'\xe9'
>>> u"é"
u'\xe9'
>>> u"é".encode('utf-8')
'\xc3\xa9'
>>> print u"é".encode('utf-8')
é
The letter is é which is LATIN SMALL LETTER E WITH ACUTE (U+00E9)
The UTF-8 byte encoding for is: c3a9
The latin byte encoding is: e9
How do I get the UTF-8 encoded version of a latin string? Could someone give an example of how to convert the é?
To decode a byte sequence from latin 1 to Unicode, use the .decode()
method:
>>> '\xe9'.decode('latin1')
u'\xe9'
Python uses \xab
escapes for unicode codepoints below \u00ff
.
>>> '\xe9'.decode('latin1') == u'\u00e9'
True
The above Latin-1 character can be encoded to UTF-8 as:
>>> '\xe9'.decode('latin1').encode('utf8')
'\xc3\xa9'