How To Determine the 'filter' Parameter in the Keras Conv2D Function

Joe picture Joe · Jan 13, 2018 · Viewed 14.2k times · Source

I'm just beginning my ML journey and have done a few tutorials. One thing that's not clear (to me) is how the 'filter' parameter is determined for Keras Conv2D.

Most sources I've read simply set the parameter to 32 without explanation. Is this just a rule of thumb or do the dimensions of the input images play a part? For example, the images in CIFAR-10 are 32x32

Specifically:

model = Sequential()
filters = 32
model.add(Conv2D(filters, (3, 3), padding='same', input_shape=x_train.shape[1:]))

model.add(Activation('relu'))
model.add(Conv2D(filters, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

The next layer has a filter parameter of filter*2 or 64. Again, how is this calculated?

Tx.

Joe

Answer

Marcin Możejko picture Marcin Możejko · Jan 13, 2018

Actually - there is no a good answer to your question. Most of the architectures are usually carefully designed and finetuned during many experiments. I could share with you some of the rules of thumbs one should apply when designing its own architecture:

  1. Avoid a dimension collapse in the first layer. Let's assume that your input filter has a (n, n) spatial shape for RGB image. In this case, it is a good practice to set the filter numbers to be greater than n * n * 3 as this is the dimensionality of the input of a single filter. If you set smaller number - you could suffer from the fact that many useful pieces of information about the image are lost due to initialization which dropped informative dimensions. Of course - this is not a general rule - e.g. for a texture recognition, where image complexity is lower - a small number of filters might actually help.

  2. Think more about volume than filters number - when setting the number of filters it's important to think about the volume change instead of the change of filter numbers between the consecutive layers. E.g. in VGG - even though the number of filters doubles after pooling layer - the actual feature map volume is decreased by a factor of 2, because of pooling decreasing the feature map by a factor of 4. Usually decreasing the size of the volume by more than 3 should be considered as a bad practice. Most of the modern architectures use the volume drop factor in the range between 1 and 2. Still - this is not a general rule - e.g. in case of a narrow hierarchy - the greater value of volume drop might actually help.

  3. Avoid bottlenecking. As one may read in this milestone paper bottlenecking might seriously harm your training process. It occurs when dropping the volume is too severe. Of course - this still might be achieved - but then you should use the intelligent downsampling, used e.g. in Inception v>2

  4. Check 1x1 convolutions - it's believed that filters activation are highly correlated. One may take advantage of it by using 1x1 convolutions - namely convolution with a filter size of 1. This makes possible e.g. volume dropping by them instead of pooling or intelligent downsampling (see example here). You could e.g. build twice more filters and then cut 25% of them by using 1x1 convs as a consecutive layer.

As you may see. There is no easy way to choose the number of filters. Except for the hints above, I'd like to share with you one of my favorite sanity checks on the number of filters. It takes 2 easy steps:

  1. Try to overfit at 500 random images with regularization.
  2. Try to overfit at the whole dataset without any regularization.

Usually - if the number of filters is too low (in general) - these two tests will show you that. If - during your training process - with regularization - your network severely overfits - this is a clear indicator that your network has way too many filters.

Cheers.