I have a dict with "metadata" for my dataset, of sort
{'m1': array_1, 'm2': array_2, ...}.
Each of the arrays has shape (N, ...), where N is the number of samples.
The question:
Is it possible to create a tf.data.Dataset that outputs a dictionary {'meta_1': sub_array_1, 'meta_2': sub_array_2, ...}
for each iteration of the datasets iterator.get_next()? Here, sub_array_i should contain the ith metadata for one batch, so should have shape (batch_sz, ...).
What I tried so far is using tf.data.Dataset.from_generator(), like this:
N = 100
# dictionary of arrays:
metadata = {'m1': np.zeros(shape=(N,2)), 'm2': np.ones(shape=(N,3,5))}
num_samples = N
def meta_dict_gen():
for i in range(num_samples):
ls = {}
for key, val in metadata.items():
ls[key] = val[i]
yield ls
dataset = tf.data.Dataset.from_generator(meta_dict_gen, output_types=(dict))
The problem with this seems to be in output_types=(dict)
. The code above throws at me a
TypeError: Expected DataType for argument 'Tout' not < class 'dict'>.
I'm using tensorflow 1.8 and python 3.6.
So actually it is possible to do what you intend, you just have to be specific about the contents of the dict:
import tensorflow as tf
import numpy as np
N = 100
# dictionary of arrays:
metadata = {'m1': np.zeros(shape=(N,2)), 'm2': np.ones(shape=(N,3,5))}
num_samples = N
def meta_dict_gen():
for i in range(num_samples):
ls = {}
for key, val in metadata.items():
ls[key] = val[i]
yield ls
dataset = tf.data.Dataset.from_generator(
meta_dict_gen,
output_types={k: tf.float32 for k in metadata},
output_shapes={'m1': (2,), 'm2': (3, 5)})
iter = dataset.make_one_shot_iterator()
next_elem = iter.get_next()
print(next_elem)
Output:
{'m1': <tf.Tensor 'IteratorGetNext:0' shape=(2,) dtype=float32>,
'm2': <tf.Tensor 'IteratorGetNext:1' shape=(3, 5) dtype=float32>}