Is there a possibility in h5py to create a dataset which consists of lists of strings. I tried to create a nested datatype of variable length, but this results in segmentation fault in my python interpreter.
def create_dataset(h5py_file):
data = [['I', 'am', 'a', 'sentecne'], ['another', 'sentence']]
string_dt = h5py.special_dtype(vlen=str)
nested_dt = h5py.special_dtype(vlen=string_dt)
h5py_file.create_dataset("sentences", data=data, dtype = nested_dt)
If you don't intend to edit the hdf5 file (and potentially use longer strings), you can also simply use:
h5py_file.create_dataset("sentences", data=np.array(data, dtype='S'))