I don't understand encode and decode in Python (2.7.3)

Narcisse Doudieu Siewe picture Narcisse Doudieu Siewe · Jul 22, 2012 · Viewed 22.8k times · Source

I tried to understand by myself encode and decode in Python but nothing is really clear for me.

  1. str.encode([encoding,[errors]])
  2. str.decode([encoding,[errors]])

First, I don't understand the need of the "encoding" parameter in these two functions.

What is the output of each function, its encoding? What is the use of the "encoding" parameter in each function? I don't really understand the definition of "bytes string".

I have an important question, is there some way to pass from one encoding to another? I have read some text on ASN.1 about "octet string", so I wondered whether it was the same as "bytes string".

Thanks for you help.

Answer

lvc picture lvc · Jul 22, 2012

It's a little more complex in Python 2 (compared to Python 3), since it conflates the concepts of 'string' and 'bytestring' quite a bit, but see The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets. Essentially, what you need to understand is that 'string' and 'character' are abstract concepts that can't be directly represented by a computer. A bytestring is a raw stream of bytes straight from disk (or that can be written straight from disk). encode goes from abstract to concrete (you give it preferably a unicode string, and it gives you back a byte string); decode goes the opposite way.

The encoding is the rule that says 'a' should be represented by the byte 0x61 and 'α' by the two-byte sequence 0xc0\xb1.