How to store dictionary in HDF5 dataset

theta picture theta · May 11, 2013 · Viewed 40.2k times · Source

I have a dictionary, where key is datetime object and value is tuple of integers:

>>> d.items()[0]
(datetime.datetime(2012, 4, 5, 23, 30), (14, 1014, 6, 3, 0))

I want to store it in HDF5 dataset, but if I try to just dump the dictionary h5py raises error:

TypeError: Object dtype dtype('object') has no native HDF5 equivalent

What would be "the best" way to transform this dictionary so that I can store it in HDF5 dataset?

Specifically I don't want to just dump the dictionary in numpy array, as it would complicate data retrieval based on datetime query.

Answer

theta picture theta · May 11, 2013

I found two ways to this:

I) transform datetime object to string and use it as dataset name

h = h5py.File('myfile.hdf5')
for k, v in d.items():
    h.create_dataset(k.strftime('%Y-%m-%dT%H:%M:%SZ'), data=np.array(v, dtype=np.int8))

where data can be accessed by quering key strings (datasets name). For example:

for ds in h.keys():
    if '2012-04' in ds:
        print(h[ds].value)

II) transform datetime object to dataset subgroups

h = h5py.File('myfile.hdf5')
for k, v in d.items():
    h.create_dataset(k.strftime('%Y/%m/%d/%H:%M'), data=np.array(v, dtype=np.int8))

notice forward slashes in strftime string, which will create appropriate subgroups in HDF file. Data can be accessed directly like h['2012']['04']['05']['23:30'].value, or by iterating with provided h5py iterators, or even by using custom functions through visititems()

For simplicity I choose the first option.