Storing a list of strings to a HDF5 Dataset from Python

gman picture gman · Apr 22, 2014 · Viewed 23.6k times · Source

I am trying to store a variable length list of string to a HDF5 Dataset. The code for this is

import h5py
h5File=h5py.File('xxx.h5','w')
strList=['asas','asas','asas']  
h5File.create_dataset('xxx',(len(strList),1),'S10',strList)
h5File.flush() 
h5File.Close()  

I am getting an error stating that "TypeError: No conversion path for dtype: dtype('&lt U3')" where the &lt means actual less than symbol
How can I solve this problem.

Answer

SlightlyCuban picture SlightlyCuban · Apr 22, 2014

You're reading in Unicode strings, but specifying your datatype as ASCII. According to the h5py wiki, h5py does not currently support this conversion.

You'll need to encode the strings in a format h5py handles:

asciiList = [n.encode("ascii", "ignore") for n in strList]
h5File.create_dataset('xxx', (len(asciiList),1),'S10', asciiList)

Note: not everything encoded in UTF-8 can be encoded in ASCII!