how to store numpy arrays as tfrecord?

csbk picture csbk · Dec 18, 2017 · Viewed 11k times · Source

I am trying to create a dataset in tfrecord format from numpy arrays. I am trying to store 2d and 3d coordinates.

2d coordinates are numpy array of shape (2,10) of type float64 3d coordinates are numpy array of shape (3,10) of type float64

this is my code:

def _floats_feature(value):
    return tf.train.Feature(float_list=tf.train.FloatList(value=value))


train_filename = 'train.tfrecords'  # address to save the TFRecords file
writer = tf.python_io.TFRecordWriter(train_filename)


for c in range(0,1000):

    #get 2d and 3d coordinates and save in c2d and c3d

    feature = {'train/coord2d': _floats_feature(c2d),
                   'train/coord3d': _floats_feature(c3d)}
    sample = tf.train.Example(features=tf.train.Features(feature=feature))
    writer.write(sample.SerializeToString())

writer.close()

when i run this i get the error:

  feature = {'train/coord2d': _floats_feature(c2d),
  File "genData.py", line 19, in _floats_feature
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\google\protobuf\internal\python_message.py", line 510, in init
copy.extend(field_value)
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\google\protobuf\internal\containers.py", line 275, in extend
new_values = [self._type_checker.CheckValue(elem) for elem in elem_seq_iter]
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\google\protobuf\internal\containers.py", line 275, in <listcomp>
new_values = [self._type_checker.CheckValue(elem) for elem in elem_seq_iter]
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\google\protobuf\internal\type_checkers.py", line 109, in CheckValue
raise TypeError(message)
TypeError: array([-163.685,  240.818, -114.05 , -518.554,  107.968,  427.184,
    157.418, -161.798,   87.102,  406.318]) has type <class 'numpy.ndarray'>, but expected one of: ((<class 'numbers.Real'>,),)

I dont know how to fix this. should i store the features as int64 or bytes? I have no clue how to go about this since i am completely new to tensorflow. any help would be great! thanks

Answer

mrry picture mrry · Dec 18, 2017

The tf.train.Feature class only supports lists (or 1-D arrays) when using the float_list argument. Depending on your data, you might try one of the following approaches:

  1. Flatten the data in your array before passing it to tf.train.Feature:

    def _floats_feature(value):
      return tf.train.Feature(float_list=tf.train.FloatList(value=value.reshape(-1)))
    

    Note that you might need to add another feature to indicate how this data should be reshaped when you parse it again (and you could use an int64_list feature for that purpose).

  2. Split the multidimensional feature into multiple 1-D features. For example, if c2d contains an N * 2 array of x- and y-coordinates, you could split that feature into separate train/coord2d/x and train/coord2d/y features, each containing the x- and y-coordinate data, respectively.