Activation function after pooling layer or convolutional layer?

malioboro picture malioboro · Feb 22, 2016 · Viewed 16.6k times · Source

The theory from these links show that the order of Convolutional Network is: Convolutional Layer - Non-linear Activation - Pooling Layer.

  1. Neural networks and deep learning (equation (125)
  2. Deep learning book (page 304, 1st paragraph)
  3. Lenet (the equation)
  4. The source in this headline

But, in the last implementation from those sites, it said that the order is: Convolutional Layer - Pooling Layer - Non-linear Activation

  1. network3.py
  2. The sourcecode, LeNetConvPoolLayer class

I've tried too to explore a Conv2D operation syntax, but there is no activation function, it's only convolution with flipped kernel. Can someone help me to explain why is this happen?

Answer

eickenberg picture eickenberg · Feb 22, 2016

Well, max-pooling and monotonely increasing non-linearities commute. This means that MaxPool(Relu(x)) = Relu(MaxPool(x)) for any input. So the result is the same in that case. So it is technically better to first subsample through max-pooling and then apply the non-linearity (if it is costly, such as the sigmoid). In practice it is often done the other way round - it doesn't seem to change much in performance.

As for conv2D, it does not flip the kernel. It implements exactly the definition of convolution. This is a linear operation, so you have to add the non-linearity yourself in the next step, e.g. theano.tensor.nnet.relu.