How to create 2-layers neural network using TensorFlow and python on MNIST data

Tai Christian picture Tai Christian · Jul 1, 2016 · Viewed 9k times · Source

I'm a newbie in machine learning and I am following tensorflow's tutorial to create some simple Neural Networks which learn the MNIST data.

I have built a single layer network (following the tutotial), accuracy was about 0.92 which is ok for me. But then I added one more layer, the accuracy reduced to 0.113, which is very bad.

Below is the relation between 2 layers:

import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 784])

#layer 1
W1 = tf.Variable(tf.zeros([784, 100]))
b1 = tf.Variable(tf.zeros([100]))
y1 = tf.nn.softmax(tf.matmul(x, W1) + b1)

#layer 2
W2 = tf.Variable(tf.zeros([100, 10]))
b2 = tf.Variable(tf.zeros([10]))
y2 = tf.nn.softmax(tf.matmul(y1, W2) + b2)

#output
y = y2
y_ = tf.placeholder(tf.float32, [None, 10])

Is my structure fine? What is the reason that makes it perform so bad? How should I modify my network?

Answer

nessuno picture nessuno · Jul 1, 2016

The input of the 2nd layer is the softmax of the output of the first layer. You don't want to do that.

You're forcing the sum of these values to be 1. If some value of tf.matmul(x, W1) + b1 is about 0 (and some certainly are) the softmax operation is lowering this value to be 0. Result: you're killing the gradient and nothing can flow trough these neurons.

If you remove the softmax between the layers (but leve it the softmax on the output layer if you want to consider the values as probability) your network will work fine.

Tl;dr:

import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 784])

#layer 1
W1 = tf.Variable(tf.zeros([784, 100]))
b1 = tf.Variable(tf.zeros([100]))
y1 = tf.matmul(x, W1) + b1 #remove softmax

#layer 2
W2 = tf.Variable(tf.zeros([100, 10]))
b2 = tf.Variable(tf.zeros([10]))
y2 = tf.nn.softmax(tf.matmul(y1, W2) + b2)

#output
y = y2
y_ = tf.placeholder(tf.float32, [None, 10])