Is there an elegant way to use struct and namedtuple instead of this?

0xC0000022L picture 0xC0000022L · Jul 12, 2012 · Viewed 12.4k times · Source

I'm reading a binary file made up of records that in C would look like this:

typedef _rec_t
{
  char text[20];
  unsigned char index[3];
} rec_t;

Now I'm able to parse this into a tuple with 23 distinct values, but would prefer if I could use namedtuple to combine the first 20 bytes into text and the three remaining bytes into index. How can I achieve that? Basically instead of one tuple of 23 values I'd prefer to have two tuples of 20 and 3 values respectively and access these using a "natural name", i.e. by means of namedtuple.

I am currently using the format "20c3B" for struct.unpack_from().

Note: There are many consecutive records in the string when I call parse_text.


My code (stripped down to the relevant parts):

#!/usr/bin/env python
import sys
import os
import struct
from collections import namedtuple

def parse_text(data):
    fmt = "20c3B"
    l = len(data)
    sz = struct.calcsize(fmt)
    num = l/sz
    if not num:
        print "ERROR: no records found."
        return
    print "Size of record %d - number %d" % (sz, num)
    #rec = namedtuple('rec', 'text index')
    empty = struct.unpack_from(fmt, data)
    # Loop through elements
    # ...

def main():
    if len(sys.argv) < 2:
        print "ERROR: need to give file with texts as argument."
        sys.exit(1)
    s = os.path.getsize(sys.argv[1])
    f = open(sys.argv[1])
    try:
        data = f.read(s)
        parse_text(data)
    finally:
        f.close()

if __name__ == "__main__":
    main()

Answer

Samy Vilar picture Samy Vilar · Jul 13, 2012

According to the docs: http://docs.python.org/library/struct.html

Unpacked fields can be named by assigning them to variables or by wrapping the result in a named tuple:

>>> record = 'raymond   \x32\x12\x08\x01\x08'
>>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)

>>> from collections import namedtuple
>>> Student = namedtuple('Student', 'name serialnum school gradelevel')
>>> Student._make(unpack('<10sHHb', record))
Student(name='raymond   ', serialnum=4658, school=264, gradelevel=8)

so in your case

>>> import struct
>>> from collections import namedtuple
>>> data = "1"*23
>>> fmt = "20c3B"
>>> Rec = namedtuple('Rec', 'text index') 
>>> r = Rec._make([struct.unpack_from(fmt, data)[0:20], struct.unpack_from(fmt, data)[20:]])
>>> r
Rec(text=('1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'), index=(49, 49, 49))
>>>

slicing the unpack variables maybe a problem, if the format was fmt = "20si" or something standard where we don't return sequential bytes, we wouldn't need to do this.

>>> import struct
>>> from collections import namedtuple
>>> data = "1"*24
>>> fmt = "20si"
>>> Rec = namedtuple('Rec', 'text index') 
>>> r = Rec._make(struct.unpack_from(fmt, data))
>>> r
Rec(text='11111111111111111111', index=825307441)
>>>