How to input a list of lists with different sizes in tf.data.Dataset

Escachator picture Escachator · Nov 30, 2017 · Viewed 11.4k times · Source

I have a long list of lists of integers (representing sentences, each one of different sizes) that I want to feed using the tf.data library. Each list (of the lists of list) has different length, and I get an error, which I can reproduce here:

t = [[4,2], [3,4,5]]
dataset = tf.data.Dataset.from_tensor_slices(t)

The error I get is:

ValueError: Argument must be a dense tensor: [[4, 2], [3, 4, 5]] - got shape [2], but wanted [2, 2].

Is there a way to do this?

EDIT 1: Just to be clear, I don't want to pad the input list of lists (it's a list of sentences containing over a million elements, with varying lengths) I want to use the tf.data library to feed, in a proper way, a list of lists with varying length.

Answer

mrry picture mrry · Dec 8, 2017

You can use tf.data.Dataset.from_generator() to convert any iterable Python object (like a list of lists) into a Dataset:

t = [[4, 2], [3, 4, 5]]

dataset = tf.data.Dataset.from_generator(lambda: t, tf.int32, output_shapes=[None])

iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()

with tf.Session() as sess:
  print(sess.run(next_element))  # ==> '[4, 2]'
  print(sess.run(next_element))  # ==> '[3, 4, 5]'