Is it good learning rate for Adam method?

John picture John · Mar 23, 2017 · Viewed 27.5k times · Source

I am training my method. I got the result as below. Is it a good learning rate? If not, is it high or low? This is my result

enter image description here

lr_policy: "step"
gamma: 0.1
stepsize: 10000
power: 0.75
# lr for unnormalized softmax
base_lr: 0.001
# high momentum
momentum: 0.99
# no gradient accumulation
iter_size: 1
max_iter: 100000
weight_decay: 0.0005
snapshot: 4000
snapshot_prefix: "snapshot/train"
type:"Adam"

This is reference

With low learning rates the improvements will be linear. With high learning rates they will start to look more exponential. Higher learning rates will decay the loss faster, but they get stuck at worse values of loss enter image description here

Answer

Thomas Pinetz picture Thomas Pinetz · Mar 23, 2017

The learning rate looks a bit high. The curve decreases too fast for my taste and flattens out very soon. I would try 0.0005 or 0.0001 as a base learning rate if I wanted to get additional performance. You can quit after several epochs anyways if you see that this does not work.

The question you have to ask yourself though is how much performance do you need and how close you are to accomplishing the performance required. I mean that you are probably training a neural network for a specific purpose. Often times you can get more performance out of the network by increasing its capacity, instead of fine tuning the learning rate which is pretty good if not perfect anyways.