I'm really confused. I tried to encode but the error said can't decode...
.
>>> "你好".encode("utf8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
I know how to avoid the error with "u" prefix on the string. I'm just wondering why the error is "can't decode" when encode was called. What is Python doing under the hood?
"你好".encode('utf-8')
encode
converts a unicode object to a string
object. But here you have invoked it on a string
object (because you don't have the u). So python has to convert the string
to a unicode
object first. So it does the equivalent of
"你好".decode().encode('utf-8')
But the decode fails because the string isn't valid ascii. That's why you get a complaint about not being able to decode.