reading a binary file in python

Brian picture Brian · Nov 11, 2011 · Viewed 9.4k times · Source

I have to read a binary file in python. This is first written by a Fortran 90 program in this way:

open(unit=10,file=filename,form='unformatted')
write(10)table%n1,table%n2
write(10)table%nH
write(10)table%T2
write(10)table%cool
write(10)table%heat
write(10)table%cool_com
write(10)table%heat_com
write(10)table%metal
write(10)table%cool_prime
write(10)table%heat_prime
write(10)table%cool_com_prime
write(10)table%heat_com_prime
write(10)table%metal_prime
write(10)table%mu
if (if_species_abundances) write(10)table%n_spec
close(10)

I can easily read this binary file with the following IDL code:

n1=161L
n2=101L
openr,1,file,/f77_unformatted
readu,1,n1,n2
print,n1,n2
spec=dblarr(n1,n2,6)
metal=dblarr(n1,n2)
cool=dblarr(n1,n2)
heat=dblarr(n1,n2)
metal_prime=dblarr(n1,n2)
cool_prime=dblarr(n1,n2)
heat_prime=dblarr(n1,n2)
mu  =dblarr(n1,n2)
n   =dblarr(n1)
T   =dblarr(n2)
Teq =dblarr(n1)
readu,1,n
readu,1,T
readu,1,Teq
readu,1,cool
readu,1,heat
readu,1,metal
readu,1,cool_prime
readu,1,heat_prime
readu,1,metal_prime
readu,1,mu
readu,1,spec
print,spec
close,1

What I want to do is reading this binary file with Python. But there are some problems. First of all, here is my attempt to read the file:

import numpy
from numpy import *
import struct

file='name_of_my_file'
with open(file,mode='rb') as lines:
    c=lines.read()

I try to read the first two variables:

dummy, n1, n2, dummy = struct.unpack('iiii',c[:16])

But as you can see I had to add to dummy variables because, somehow, the fortran programs add the integer 8 in those positions.

The problem is now when trying to read the other bytes. I don't get the same result of the IDL program.

Here is my attempt to read the array n

 double = 8
 end = 16+n1*double
 nH = struct.unpack('d'*n1,c[16:end])

However, when I print this array I get non sense value. I mean, I can read the file with the above IDL code, so I know what to expect. So my question is: how can I read this file when I don't know exactly the structure? Why with IDL it is so simple to read it? I need to read this data set with Python.

Answer

Nate picture Nate · Nov 11, 2011

What you're looking for is the struct module.

This module allows you to unpack data from strings, treating it like binary data.

You supply a format string, and your file string, and it will consume the data returning you binary objects.

For example, using your variables:

import struct
content = f.read() #I'm not sure why in a binary file you were using "readlines",
                   #but if this is too much data, you can supply a size to read()
n, T, Teq, cool = struct.unpack("dddd",content[:32])

This will make n, T, Teq, and cool hold the first four doubles in your binary file. Of course, this is just a demonstration. Your example looks like it wants lists of doubles - conveniently struct.unpack returns a tuple, which I take for your case will still work fine (if not, you can listify them). Keep in mind that struct.unpack needs to consume the whole string passed into it - otherwise you'll get a struct.error. So, either slice your input string, or only read the number of characters you'll use, like I said above in my comment.

For example,

n_content = f.read(8*number_of_ns) #8, because doubles are 8 bytes
n = struct.unpack("d"*number_of_ns,n_content)