In one of the tutorials I am working on (link given below), the author outlines the baseline neural network structure as:
Convolutional input layer, 32 feature maps with a size of 3×3, a rectifier activation function and a weight constraint of max norm set to 3.
model.add(Conv2D(32, (3, 3), input_shape=(3, 32, 32), padding='same', activation='relu', kernel_constraint=maxnorm(3)))
What does weight constraint of max norm mean and do to the Conv layer? (We are using Keras.)
Thank you!
What does a weight constraint of max_norm
do?
maxnorm(m)
will, if the L2-Norm of your weights exceeds m
, scale your whole weight matrix by a factor that reduces the norm to m
.
As you can find in the keras code in class MaxNorm(Constraint)
:
Now source code in the tensorflow.
def __call__(self, w):
norms = K.sqrt(K.sum(K.square(w), axis=self.axis, keepdims=True))
desired = K.clip(norms, 0, self.max_value)
w *= (desired / (K.epsilon() + norms))
return w
Aditionally, maxnorm
has an axis
argument, along which the norm is calculated. In your example you don't specify an axis, thus the norm is calculated over the whole weight matrix. If for example, you want to constrain the norm of every convolutional filter, assuming that you are using tf
dimension ordering, the weight matrix will have the shape (rows, cols, input_depth, output_depth)
. Calculating the norm over axis = [0, 1, 2]
will constrain each filter to the given norm.
Why to do it?
Constraining the weight matrix directly is another kind of regularization. If you use a simple L2 regularization term you penalize high weights with your loss function. With this constraint, you regularize directly.
As also linked in the keras
code, this seems to work especially well in combination with a dropout
layer. More more info see chapter 5.1 in this paper