Should there be one bias per layer or one bias for each node?

Nate_Hirsch picture Nate_Hirsch · Jan 25, 2016 · Viewed 7.9k times · Source

I am looking to implement a generic neural network, with 1 input layer consisting of input nodes, 1 output layer consisting of output nodes, and N hidden layers consisting of hidden nodes. Nodes are organized into layers, with the rule that nodes in the same layer cannot be connected.

I mostly understand the concept of the bias, but I have a question.

Should there be one bias value per layer (shared by all nodes in that layer) or should each node (except nodes in the input layer) have their own bias value?

I have a feeling it could be done both ways, and would like to understand the trade-offs of each approach, and also know what implementation is most commonly used.

Answer

Dennis Soemers picture Dennis Soemers · Jan 13, 2018

Intuitive View

To answer this question properly, we should first establish exactly what we mean when we say "Bias value" as done in the question. Neural Networks are typically intuitively viewed (and explained to beginners) as a network of nodes (neurons) and weighted, directed connections between nodes. In this view, Biases are very frequently drawn as additional ''input'' nodes, which always have an activation level of exactly 1.0. This value of 1.0 may be what some people think of when they hear "Bias Value". Such a Bias Node would have connections to other nodes, with trainable weights. Other people may think of those weights as "Bias Values". Since the question was tagged with the bias-neuron tag, I'll answer the question under the assumption that we use the first definition, e.g. Bias Value = 1.0 for some Bias Node / neuron.

From this point of view... it absolutely does not matter at all mathematically how many Bias nodes/values we put in our network, as long as we make sure to connect them to the correct nodes. You could intuitively think of the entire network as having only a single bias node with a value of 1.0 that does not belong to any particular layer, and has connections to all nodes other than the input nodes. This may be difficult to draw though, if you want to make a drawing of your neural network it may be more convenient to place a separate bias node (each with a value of 1.0) in every layer except for the output layer, and connect each of those bias nodes to all the nodes in the layer directly after it. Mathematically, these two interpretations are equivalent, since in both cases every non-input node has an incoming weighted connection from a node that always has an activation level of 1.0.

Programming View

When Neural Networks are programmed, there typically aren't any explicit node ''objects'' at all (at least in efficient implementations). There will generally just be matrices for the weights. From this point of view, there is no longer any choice. We'll (almost) always want one ''bias-weight'' (a weight being multiplied by a constant activation level of 1.0) going to every non-input node, and we'll have to make sure all those weights appear in the correct spots in our weight matrices.