Estimating the number of neurons and number of layers of an artificial neural network

ladi picture ladi · Jul 27, 2010 · Viewed 39.9k times · Source

I am looking for a method on how to calculate the number of layers and the number of neurons per layer. As input I only have the size of the input vector, the size of the output vector and the size of the training set.

Usually the best net is determined by trying different net topologies and selecting the one with the least error. Unfortunately I cannot do that.

Answer

Nate Kohl picture Nate Kohl · Jul 27, 2010

This is a really hard problem.

The more internal structure a network has, the better that network will be at representing complex solutions. On the other hand, too much internal structure is slower, may cause training to diverge, or lead to overfitting -- which would prevent your network from generalizing well to new data.

People have traditionally approached this problem in several different ways:

  1. Try different configurations, see what works best. You can divide your training set into two pieces -- one for training, one for evaluation -- and then train and evaluate different approaches. Unfortunately it sounds like in your case this experimental approach isn't available.

  2. Use a rule of thumb. A lot of people have come up with a lot of guesses as to what works best. Concerning the number of neurons in the hidden layer, people have speculated that (for example) it should (a) be between the input and output layer size, (b) set to something near (inputs+outputs) * 2/3, or (c) never larger than twice the size of the input layer.

    The problem with rules of thumb is that they don't always take into account vital pieces of information, like how "difficult" the problem is, what the size of the training and testing sets are, etc. Consequently, these rules are often used as rough starting points for the "let's-try-a-bunch-of-things-and-see-what-works-best" approach.

  3. Use an algorithm that dynamically adjusts the network configuration. Algorithms like Cascade Correlation start with a minimal network, then add hidden nodes during training. This can make your experimental setup a bit simpler, and (in theory) can result in better performance (because you won't accidentally use an inappropriate number of hidden nodes).

There's a lot of research on this subject -- so if you're really interested, there is a lot to read. Check out the citations on this summary, in particular: