In a python source code I stumbled upon I've seen a small b before a string like in:
b"abcdef"
I know about the u
prefix signifying a unicode string, and the r
prefix for a raw string literal.
What does the b
stand for and in which kind of source code is it useful as it seems to be exactly like a plain string without any prefix?
The b
prefix signifies a bytes
string literal.
If you see it used in Python 3 source code, the expression creates a bytes
object, not a regular Unicode str
object. If you see it echoed in your Python shell or as part of a list, dict or other container contents, then you see a bytes
object represented using this notation.
bytes
objects basically contain a sequence of integers in the range 0-255, but when represented, Python displays these bytes as ASCII codepoints to make it easier to read their contents. Any bytes outside the printable range of ASCII characters are shown as escape sequences (e.g. \n
, \x82
, etc.). Inversely, you can use both ASCII characters and escape sequences to define byte values; for ASCII values their numeric value is used (e.g. b'A'
== b'\x41'
)
Because a bytes
object consist of a sequence of integers, you can construct a bytes
object from any other sequence of integers with values in the 0-255 range, like a list:
bytes([72, 101, 108, 108, 111])
and indexing gives you back the integers (but slicing produces a new bytes
value; for the above example, value[0]
gives you 72
, but value[:1]
is b'H'
as 72 is the ASCII code point for the capital letter H).
bytes
model binary data, including encoded text. If your bytes
value does contain text, you need to first decode it, using the correct codec. If the data is encoded as UTF-8, for example, you can obtain a Unicode str
value with:
strvalue = bytesvalue.decode('utf-8')
Conversely, to go from text in a str
object to bytes
you need to encode. You need to decide on an encoding to use; the default is to use UTF-8, but what you will need is highly dependent on your use case:
bytesvalue = strvalue.encode('utf-8')
You can also use the constructor, bytes(strvalue, encoding)
to do the same.
Both the decoding and encoding methods take an extra argument to specify how errors should be handled.
Python 2, versions 2.6 and 2.7 also support creating string literals using b'..'
string literal syntax, to ease code that works on both Python 2 and 3.
bytes
objects are immutable, just like str
strings are. Use a bytearray()
object if you need to have a mutable bytes value.