UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1

Question 1

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1

python unicode utf-8

Markum · May 12, 2012 · Viewed 332.5k times · Source

Answer

Answer

This is to do with the encoding of your terminal not being set to UTF-8. Here is my terminal

$ echo $LANG
en_GB.UTF-8
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
(｡･ω･｡)ﾉ
>>>

On my terminal the example works with the above, but if I get rid of the LANG setting then it won't work

$ unset LANG
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128)
>>>

Consult the docs for your linux variant to discover how to make this change permanent.

Question 2

I'm having a few issues trying to encode a string to UTF-8. I've tried numerous things, including using string.encode('utf-8') and unicode(string), but I get the error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1: ordinal not in range(128)

This is my string:

(｡･ω･｡)ﾉ

I don't see what's going wrong, any idea?

Edit: The problem is that printing the string as it is does not show properly. Also, this error when I try to convert it:

Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1

Answer

Related questions