Saving with h5py arrays of different sizes

Jose Javier Gonzalez Ortiz picture Jose Javier Gonzalez Ortiz · May 13, 2016 · Viewed 17.1k times · Source

I am trying to store about 3000 numpy arrays using HDF5 data format. Arrays vary in length from 5306 to 121999 np.float64

I am getting Object dtype dtype('O') has no native HDF5 equivalent error since due to the irregular nature of the data numpy uses the general object class.

My idea was to pad all the arrays to 121999 length and storing the sizes in another dataset.

However this seems quite inefficient in space, is there a better way?

EDIT: To clarify, I want to store 3126 arrays of dtype = np.float64. I have them stored in a listand when h5py does the routine it converts to an array of dtype = object because they are different lengths. To illustrate it:

a = np.array([0.1,0.2,0.3],dtype=np.float64)
b = np.array([0.1,0.2,0.3,0.4,0.5],dtype=np.float64)
c = np.array([0.1,0.2],dtype=np.float64)

arrs = np.array([a,b,c]) # This is performed inside the h5py call
print(arrs.dtype)
>>> object
print(arrs[0].dtype)
>>> float64

Answer

hpaulj picture hpaulj · May 13, 2016

Looks like you tried something like:

In [364]: f=h5py.File('test.hdf5','w')    
In [365]: grp=f.create_group('alist')

In [366]: grp.create_dataset('alist',data=[a,b,c])
...
TypeError: Object dtype dtype('O') has no native HDF5 equivalent

But if instead you save the arrays as separate datasets it works:

In [367]: adict=dict(a=a,b=b,c=c)

In [368]: for k,v in adict.items():
    grp.create_dataset(k,data=v)
   .....:     

In [369]: grp
Out[369]: <HDF5 group "/alist" (3 members)>

In [370]: grp['a'][:]
Out[370]: array([ 0.1,  0.2,  0.3])

and to access all the datasets in the group:

In [389]: [i[:] for i in grp.values()]
Out[389]: 
[array([ 0.1,  0.2,  0.3]),
 array([ 0.1,  0.2,  0.3,  0.4,  0.5]),
 array([ 0.1,  0.2])]