Accessing Data of a Theano Shared Variable

Question 1

Accessing Data of a Theano Shared Variable

python-2.7 machine-learning computer-vision theano mnist

moeabdol · Oct 21, 2013 · Viewed 7.1k times · Source

Answer

Answer

First, train_set_x and train_set_y (before the cast) and train_set are separate copy of the same train set. So I suppose you simplified your example too much, as you say that train_set_x is the input and train_set_y is the corresponding label and this don't make sense with the code.

The answer of you question depend of the contain of mnist.pkl.gz. Where did you get it? From the Deep Learning Tutorial? For my answer, I'll suppose train_set is a 2d numpy ndarray. So that you use a different mnist.pkl.gz file then the one from DLT.

With that supposition, you can call train_set_x.get_value() and this will return a copy of the ndarray in shared variable. If you don't want a copy, you can do train_set_x.get_value(borrow=True) and this will work. If the shared variable is on the GPU, this will copy the data from the GPU to the CPU, but it won't copy the data if it is already on the CPU.

train_set_y is a Theano graph, not a Theano shared variable. So you can't call get_value() on it. You need to compile and run the graph that give train_set_y. If you want to evaluate it only once, you can call train_set_y.eval() as a shortcut to compile and run it as it do not take any input except shared variable.

So you can do this:

for x,y in zip(train_set_x.get_value(), train_set_y.eval()):
   print x, y

Question 2

I'v successfully loaded the MNIST dataset into Theano shared variables as follows

# Read MNIST dataset from gzipped file as binary
f = gzip.open('mnist.pkl.gz', 'rb')
# Store dataset into variable
train_set = cPickle.load(f)
# Close zipped file
f.close()
# Store data in Theano shared variable
train_set_x = theano.shared(numpy.asarray(train_set, dtype=theano.config.floatX)) # Data
train_set_y = theano.shared(numpy.asarray(train_set, dtype=theano.config.floatX)) # Labels
# Cast labels into int
train_set_y = theano.tensor.cast(train_set_y, 'int32')

My question is how do I access the data in both train_set_x and train_set_y. Each image in the data set is 28 * 28 pixels. That is a vector of length 784 with all elements in the vector as floats representing values between 0.0 and 1.0 inclusive. The labels are casted into int because it represents the label associated to each vector image and is a value between 0 and 9. I want to be able to loop over the train_set_x matrix images and train_set_y labels to view the data of each image and its label separately and eventually plot the images on screen.

Accessing Data of a Theano Shared Variable

Answer

Related questions