I am a little confused about how should I use/insert "BatchNorm"
layer in my models.
I see several different approaches, for instance:
"BatchNorm"
+"Scale"
(no parameter sharing)"BatchNorm"
layer is followed immediately with "Scale"
layer:
layer {
bottom: "res2a_branch1"
top: "res2a_branch1"
name: "bn2a_branch1"
type: "BatchNorm"
batch_norm_param {
use_global_stats: true
}
}
layer {
bottom: "res2a_branch1"
top: "res2a_branch1"
name: "scale2a_branch1"
type: "Scale"
scale_param {
bias_term: true
}
}
"BatchNorm"
In the cifar10 example provided with caffe, "BatchNorm"
is used without any "Scale"
following it:
layer {
name: "bn1"
type: "BatchNorm"
bottom: "pool1"
top: "bn1"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
batch_norm_param
for TRAIN
and TEST
batch_norm_param: use_global_scale
is changed between TRAIN
and TEST
phase:
layer {
name: "bn1"
type: "BatchNorm"
bottom: "pool1"
top: "bn1"
batch_norm_param {
use_global_stats: false
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
include {
phase: TRAIN
}
}
layer {
name: "bn1"
type: "BatchNorm"
bottom: "pool1"
top: "bn1"
batch_norm_param {
use_global_stats: true
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
include {
phase: TEST
}
}
How should one use"BatchNorm"
layer in caffe?
If you follow the original paper, the Batch normalization should be followed by Scale and Bias layers (the bias can be included via the Scale, although this makes the Bias parameters inaccessible). use_global_stats
should also be changed from training (False) to testing/deployment (True) - which is the default behavior. Note that the first example you give is a prototxt for deployment, so it is correct for it to be set to True.
I'm not sure about the shared parameters.
I made a pull request to improve the documents on the batch normalization, but then closed it because I wanted to modify it. And then, I never got back to it.
Note that I think lr_mult: 0
for "BatchNorm"
is no longer required (perhaps not allowed?), although I'm not finding the corresponding PR now.