TensorFlow - Read all examples from a TFRecords at once?

golmschenk picture golmschenk · May 11, 2016 · Viewed 38.8k times · Source

How do you read all examples from a TFRecords at once?

I've been using tf.parse_single_example to read out individual examples using code similar to that given in the method read_and_decode in the example of the fully_connected_reader. However, I want to run the network against my entire validation dataset at once, and so would like to load them in their entirety instead.

I'm not entirely sure, but the documentation seems to suggest I can use tf.parse_example instead of tf.parse_single_example to load the entire TFRecords file at once. I can't seem to get this to work though. I'm guessing it has to do with how I specify the features, but I'm not sure how in the feature specification to state that there are multiple examples.

In other words, my attempt of using something similar to:

reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_example(serialized_example, features={
    'image_raw': tf.FixedLenFeature([], tf.string),
    'label': tf.FixedLenFeature([], tf.int64),
})

isn't working, and I assume it's because the features aren't expecting multiple examples at once (but again, I'm not sure). [This results in an error of ValueError: Shape () must have rank 1]

Is this the proper way to read all the records at once? And if so, what do I need to change to actually read the records? Thank you much!

Answer

Andrew Pierno picture Andrew Pierno · May 15, 2016

Just for clarity, I have a few thousand images in a single .tfrecords file, they're 720 by 720 rgb png files. The labels are one of 0,1,2,3.

I also tried using the parse_example and couldn't make it work but this solution works with the parse_single_example.

The downside is that right now I have to know how many items are in each .tf record, which is kind of a bummer. If I find a better way, I'll update the answer. Also, be careful going out of bounds of the number of records in the .tfrecords file, it will start over at the first record if you loop past the last record

The trick was to have the queue runner use a coordinator.

I left some code in here to save the images as they're being read in so that you can verify the image is correct.

from PIL import Image
import numpy as np
import tensorflow as tf

def read_and_decode(filename_queue):
 reader = tf.TFRecordReader()
 _, serialized_example = reader.read(filename_queue)
 features = tf.parse_single_example(
  serialized_example,
  # Defaults are not specified since both keys are required.
  features={
      'image_raw': tf.FixedLenFeature([], tf.string),
      'label': tf.FixedLenFeature([], tf.int64),
      'height': tf.FixedLenFeature([], tf.int64),
      'width': tf.FixedLenFeature([], tf.int64),
      'depth': tf.FixedLenFeature([], tf.int64)
  })
 image = tf.decode_raw(features['image_raw'], tf.uint8)
 label = tf.cast(features['label'], tf.int32)
 height = tf.cast(features['height'], tf.int32)
 width = tf.cast(features['width'], tf.int32)
 depth = tf.cast(features['depth'], tf.int32)
 return image, label, height, width, depth


def get_all_records(FILE):
 with tf.Session() as sess:
   filename_queue = tf.train.string_input_producer([ FILE ])
   image, label, height, width, depth = read_and_decode(filename_queue)
   image = tf.reshape(image, tf.pack([height, width, 3]))
   image.set_shape([720,720,3])
   init_op = tf.initialize_all_variables()
   sess.run(init_op)
   coord = tf.train.Coordinator()
   threads = tf.train.start_queue_runners(coord=coord)
   for i in range(2053):
     example, l = sess.run([image, label])
     img = Image.fromarray(example, 'RGB')
     img.save( "output/" + str(i) + '-train.png')

     print (example,l)
   coord.request_stop()
   coord.join(threads)

get_all_records('/path/to/train-0.tfrecords')