Could someone please explain to me how to update the bias throughout backpropagation?
I've read quite a few books, but can't find bias updating!
I understand that bias is an extra input of 1 with a weight attached to it (for each neuron). There must be a formula.
Following the notation of Rojas 1996, chapter 7, backpropagation computes partial derivatives of the error function E
(aka cost, aka loss)
∂E/∂w[i,j] = delta[j] * o[i]
where w[i,j]
is the weight of the connection between neurons i
and j
, j
being one layer higher in the network than i
, and o[i]
is the output (activation) of i
(in the case of the "input layer", that's just the value of feature i
in the training sample under consideration). How to determine delta
is given in any textbook and depends on the activation function, so I won't repeat it here.
These values can then be used in weight updates, e.g.
// update rule for vanilla online gradient descent
w[i,j] -= gamma * o[i] * delta[j]
where gamma
is the learning rate.
The rule for bias weights is very similar, except that there's no input from a previous layer. Instead, bias is (conceptually) caused by input from a neuron with a fixed activation of 1. So, the update rule for bias weights is
bias[j] -= gamma_bias * 1 * delta[j]
where bias[j]
is the weight of the bias on neuron j
, the multiplication with 1 can obviously be omitted, and gamma_bias
may be set to gamma
or to a different value. If I recall correctly, lower values are preferred, though I'm not sure about the theoretical justification of that.