How to update model parameters with accumulated gradients?

Question 1

How to update model parameters with accumulated gradients?

python tensorflow gradient

weixsong · Feb 10, 2017 · Viewed 7.6k times · Source

Answer

Answer

I found a solution here: https://github.com/tensorflow/tensorflow/issues/3994#event-766328647

opt = tf.train.AdamOptimizer()
tvs = tf.trainable_variables()
accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]                                        
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
gvs = opt.compute_gradients(rmse, tvs)
accum_ops = [accum_vars[i].assign_add(gv[0]) for i, gv in enumerate(gvs)]
train_step = opt.apply_gradients([(accum_vars[i], gv[1]) for i, gv in enumerate(gvs)])

In the training loop:

while True:
    sess.run(zero_ops)
    for i in xrange(n_minibatches):
        sess.run(accum_ops, feed_dict=dict(X: Xs[i], y: ys[i]))
    sess.run(train_step)

But this code seems not very clean and pretty, does anyone know how to optimize these code?

Question 2

I'm using TensorFlow to build a deep learning model. And new to TensorFlow.

Due to some reason, my model has limited batch size, then this limited batch-size will make the model has a high variance.

So, I want to use some trick to make the batch size larger. My idea is to store the gradients of each mini-batch, for example 64 mini-batches, and then sum the gradients together, use the mean gradients of this 64 mini batches of training data to update the model's parameters.

This means that for the first 63 mini-batches, do not update the parameters, and after the 64 mini batch, update the model's parameters only once.

But as TensorFlow is graph based, do anyone know how to implement this wanted feature?

Thanks very much.

How to update model parameters with accumulated gradients?

Answer

Related questions