How Does One Read Bytes from File in Python

jjnguy picture jjnguy · Sep 29, 2008 · Viewed 8k times · Source

Similar to this question, I am trying to read in an ID3v2 tag header and am having trouble figuring out how to get individual bytes in python.

I first read all ten bytes into a string. I then want to parse out the individual pieces of information.

I can grab the two version number chars in the string, but then I have no idea how to take those two chars and get an integer out of them.

The struct package seems to be what I want, but I can't get it to work.

Here is my code so-far (I am very new to python btw...so take it easy on me):

def __init__(self, ten_byte_string):
        self.whole_string = ten_byte_string
        self.file_identifier = self.whole_string[:3]
        self.major_version = struct.pack('x', self.whole_string[3:4]) #this 
        self.minor_version = struct.pack('x', self.whole_string[4:5]) # and this
        self.flags = self.whole_string[5:6]
        self.len = self.whole_string[6:10]

Printing out any value except is obviously crap because they are not formatted correctly.

Answer

Brian picture Brian · Sep 29, 2008

If you have a string, with 2 bytes that you wish to interpret as a 16 bit integer, you can do so by:

>>> s = '\0\x02'
>>> struct.unpack('>H', s)
(2,)

Note that the > is for big-endian (the largest part of the integer comes first). This is the format id3 tags use.

For other sizes of integer, you use different format codes. eg. "i" for a signed 32 bit integer. See help(struct) for details.

You can also unpack several elements at once. eg for 2 unsigned shorts, followed by a signed 32 bit value:

>>> a,b,c = struct.unpack('>HHi', some_string)

Going by your code, you are looking for (in order):

  • a 3 char string
  • 2 single byte values (major and minor version)
  • a 1 byte flags variable
  • a 32 bit length quantity

The format string for this would be:

ident, major, minor, flags, len = struct.unpack('>3sBBBI', ten_byte_string)