How to read a v7.3 mat file via h5py?

Eastsun picture Eastsun · Oct 11, 2013 · Viewed 17.4k times · Source

I have a struct array created by matlab and stored in v7.3 format mat file:

struArray = struct('name', {'one', 'two', 'three'}, 
                   'id', {1,2,3}, 
                   'data', {[1:10], [3:9], [0]})
save('test.mat', 'struArray', '-v7.3')

Now I want to read this file via python using h5py:

data = h5py.File('test.mat')
struArray = data['/struArray']

I have no idea how to get the struct data one by one from struArray:

for index in range(<the size of struArray>):
    elem = <the index th struct in struArray>
    name = <the name of elem>
    id = <the id of elem>
    data = <the data of elem>

Answer

nbedou picture nbedou · Aug 9, 2017

Matlab 7.3 file format is not extremely easy to work with h5py. It relies on HDF5 reference, cf. h5py documentation on references.

>>> import h5py
>>> f = h5py.File('test.mat')
>>> list(f.keys())
['#refs#', 'struArray']
>>> struArray = f['struArray']
>>> struArray['name'][0, 0]  # this is the HDF5 reference
<HDF5 object reference>
>>> f[struArray['name'][0, 0]].value  # this is the actual data
array([[111],
       [110],
       [101]], dtype=uint16)

To read struArray(i).id:

>>> f[struArray['id'][0, 0]][0, 0]
1.0
>>> f[struArray['id'][1, 0]][0, 0]
2.0
>>> f[struArray['id'][2, 0]][0, 0]
3.0

Notice that Matlab stores a number as an array of size (1, 1), hence the final [0, 0] to get the number.

To read struArray(i).data:

>>> f[struArray['data'][0, 0]].value
array([[  1.],
       [  2.],
       [  3.],
       [  4.],
       [  5.],
       [  6.],
       [  7.],
       [  8.],
       [  9.],
       [ 10.]])

To read struArray(i).name, it is necessary to convert the array of integers to string:

>>> f[struArray['name'][0, 0]].value.tobytes()[::2].decode()
'one'
>>> f[struArray['name'][1, 0]].value.tobytes()[::2].decode()
'two'
>>> f[struArray['name'][2, 0]].value.tobytes()[::2].decode()
'three'