Siamese Neural Network in TensorFlow

BiBi picture BiBi · Apr 25, 2016 · Viewed 9k times · Source

I'm trying to implement a Siamese Neural Network in TensorFlow but I cannot really find any working example on the Internet (see Yann LeCun paper).

enter image description here

The architecture I'm trying to build would consist of two LSTMs sharing weights and only connected at the end of the network.

My question is: how to build two different neural networks sharing their weights (tied weights) in TensorFlow and how to connect them at the end?

Thanks :)

Edit: I implemented a simple and working example of a siamese network here on MNIST.

Answer

Olivier Moindrot picture Olivier Moindrot · Apr 25, 2016

Update with tf.layers

If you use the tf.layers module to build your network, you can simply use the argument reuse=True for the second part of the Siamese network:

x = tf.ones((1, 3))
y1 = tf.layers.dense(x, 4, name='h1')
y2 = tf.layers.dense(x, 4, name='h1', reuse=True)

# y1 and y2 will evaluate to the same values
sess = tf.Session()
sess.run(tf.global_variables_initializer())
print(sess.run(y1))
print(sess.run(y2))  # both prints will return the same values

Old answer with tf.get_variable

You can try using the function tf.get_variable(). (See the tutorial)

Implement the first network using a variable scope with reuse=False:

with tf.variable_scope('Inference', reuse=False):
    weights_1 = tf.get_variable('weights', shape=[1, 1],
                              initializer=...)
    output_1 = weights_1 * input_1

Then implement the second with the same code except using reuse=True

with tf.variable_scope('Inference', reuse=True):
    weights_2 = tf.get_variable('weights')
    output_2 = weights_2 * input_2

The first implementation will create and initialize every variable of the LSTM, whereas the second implementation will use tf.get_variable() to get the same variables used in the first network. That way, variables will be shared.

Then you just have to use whatever loss you want (e.g. you can use the L2 distance between the two siamese networks), and the gradients will backpropagate through both networks, updating the shared variables with the sum of the gradients.