The following question arose because I was trying to use bytes
strings as dictionary keys and bytes values that I understood to be equal weren't being treated as equal.
Why doesn't the following python code compare equal - aren't these two equivalent representations of the same binary data (example knowingly chosen to avoid endianess)?
b'0b11111111' == b'0xff'
I know the following evaluates true, demonstrating the equivalence:
int(b'0b11111111', 2) == int(b'0xff', 16)
But why does python force me to know the representation? Is it related to endian-ness? Is there some easy way to force these to compare equivalent other than converting them all to e.g. hex literals? Can anyone suggest a transparent and clear method to move between all representations in a (somewhat) platform independent way (or am I asking too much)?
Edit:
Given the comments below, say I want to actually index a dictionary using 8 bits in the form b'0b11111111'
, then why does python expand it to ten bytes and how do I prevent that?
This is a smaller piece of a large tree data structure and expanding my indexing by a factor of 80 seems like a huge waste of memory.
Bytes can represent any number of things. Python cannot and will not guess at what your bytes might encode.
For example, int(b'0b11111111', 34)
is also a valid interpretation, but that interpretation is not equal to hex FF.
The number of interpretations, in fact, is endless. The bytes could represent a series of ASCII codepoints, or image colors, or musical notes.
Until you explicitly apply an interpretation, the bytes object consists just of the sequence of values in the range 0-255, and the textual representation of those bytes use ASCII if so representable as printable text:
>>> list(bytes(b'0b11111111'))
[48, 98, 49, 49, 49, 49, 49, 49, 49, 49]
>>> list(bytes(b'0xff'))
[48, 120, 102, 102]
Those byte sequences are not equal.
If you want to interpret these sequences explicitly as integer literals, then use ast.literal_eval()
to interpret decoded text values; always normalise first before comparison:
>>> import ast
>>> ast.literal_eval(b'0b11111111'.decode('utf8'))
255
>>> ast.literal_eval(b'0xff'.decode('utf8'))
255