I have a dictionary, where key is datetime object and value is tuple of integers:
>>> d.items()[0]
(datetime.datetime(2012, 4, 5, 23, 30), (14, 1014, 6, 3, 0))
I want to store it in HDF5 dataset, but if I try to just dump the dictionary h5py raises error:
TypeError: Object dtype dtype('object') has no native HDF5 equivalent
What would be "the best" way to transform this dictionary so that I can store it in HDF5 dataset?
Specifically I don't want to just dump the dictionary in numpy array, as it would complicate data retrieval based on datetime query.
I found two ways to this:
I) transform datetime object to string and use it as dataset name
h = h5py.File('myfile.hdf5')
for k, v in d.items():
h.create_dataset(k.strftime('%Y-%m-%dT%H:%M:%SZ'), data=np.array(v, dtype=np.int8))
where data can be accessed by quering key strings (datasets name). For example:
for ds in h.keys():
if '2012-04' in ds:
print(h[ds].value)
II) transform datetime object to dataset subgroups
h = h5py.File('myfile.hdf5')
for k, v in d.items():
h.create_dataset(k.strftime('%Y/%m/%d/%H:%M'), data=np.array(v, dtype=np.int8))
notice forward slashes in strftime string, which will create appropriate subgroups in HDF file. Data can be accessed directly like h['2012']['04']['05']['23:30'].value
, or by iterating with provided h5py iterators, or even by using custom functions through visititems()
For simplicity I choose the first option.