I'm reading a binary file made up of records that in C would look like this:
typedef _rec_t
{
char text[20];
unsigned char index[3];
} rec_t;
Now I'm able to parse this into a tuple with 23 distinct values, but would prefer if I could use namedtuple
to combine the first 20 bytes into text
and the three remaining bytes into index
. How can I achieve that? Basically instead of one tuple of 23 values I'd prefer to have two tuples of 20 and 3 values respectively and access these using a "natural name", i.e. by means of namedtuple
.
I am currently using the format "20c3B"
for struct.unpack_from()
.
Note: There are many consecutive records in the string when I call parse_text
.
My code (stripped down to the relevant parts):
#!/usr/bin/env python
import sys
import os
import struct
from collections import namedtuple
def parse_text(data):
fmt = "20c3B"
l = len(data)
sz = struct.calcsize(fmt)
num = l/sz
if not num:
print "ERROR: no records found."
return
print "Size of record %d - number %d" % (sz, num)
#rec = namedtuple('rec', 'text index')
empty = struct.unpack_from(fmt, data)
# Loop through elements
# ...
def main():
if len(sys.argv) < 2:
print "ERROR: need to give file with texts as argument."
sys.exit(1)
s = os.path.getsize(sys.argv[1])
f = open(sys.argv[1])
try:
data = f.read(s)
parse_text(data)
finally:
f.close()
if __name__ == "__main__":
main()
According to the docs: http://docs.python.org/library/struct.html
Unpacked fields can be named by assigning them to variables or by wrapping the result in a named tuple:
>>> record = 'raymond \x32\x12\x08\x01\x08'
>>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)
>>> from collections import namedtuple
>>> Student = namedtuple('Student', 'name serialnum school gradelevel')
>>> Student._make(unpack('<10sHHb', record))
Student(name='raymond ', serialnum=4658, school=264, gradelevel=8)
so in your case
>>> import struct
>>> from collections import namedtuple
>>> data = "1"*23
>>> fmt = "20c3B"
>>> Rec = namedtuple('Rec', 'text index')
>>> r = Rec._make([struct.unpack_from(fmt, data)[0:20], struct.unpack_from(fmt, data)[20:]])
>>> r
Rec(text=('1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'), index=(49, 49, 49))
>>>
slicing the unpack variables maybe a problem, if the format was fmt = "20si"
or something standard where we don't return sequential bytes, we wouldn't need to do this.
>>> import struct
>>> from collections import namedtuple
>>> data = "1"*24
>>> fmt = "20si"
>>> Rec = namedtuple('Rec', 'text index')
>>> r = Rec._make(struct.unpack_from(fmt, data))
>>> r
Rec(text='11111111111111111111', index=825307441)
>>>