Can somebody tell me what include_top= True means when defining a model in keras?
I read the meaning of this line in Keras Documentation. It says include_top: whether to include the fully-connected layer at the top of the network.
I am still looking for an intuitive explanation for this line of code.
ResNet50(include_top=True)
Thanks!
Most of these models are a series of convolutional layers followed by one or a few dense (or fully connected) layers.
Include_top
lets you select if you want the final dense layers or not.
the convolutional layers work as feature extractors. They identify a series of patterns in the image, and each layer can identify more elaborate patterns by seeing patterns of patterns.
the dense layers are capable of interpreting the found patterns in order to classify: this image contains cats, dogs, cars, etc.
About the weights:
the weights in a convolutional layer are fixed-size. They are the size of the kernel x filters. Example: a 3x3 kernel of 10 filters. A convolutional layer doesn't care about the size of the input image. It just does the convolutions and present a resulting image based on the size of the input image. (Search for some illustrated tutorials about convolutions if this is unclear)
now the weights in a dense layer are totally dependent on the input size. It's one weight per element of the input. So this demands that your input be always the same size, or else you won't have proper learned weights.
Because of this, removing the final dense layers allows you to define the input size (see in documentation). (And the output size will increase/decrease accordingly).
But you lose the interpretation/classification layers. (You can add your own, depending on your task)
Global poolings:
After the last convolutional layers, your outputs are still like images. They have shape (images, X, Y, channels)
, where X
and Y
are spatial dimensions of a 2D image.
When your model has GlobalMaxPooling2D
or GlobalAveragePooling2D
, it will eliminate the spatial dimensions. With Max
it will take only the highest value pixel for each channel. With Average
it will take the mean value of each channel. The result will be just (images, channels)
, without spatial dimensions anymore.
Flatten
With flatten, the spatial dimensions will not be lost, but they will be transformed in features. From (images, X, Y, channels)
to (images, X*Y*channels)
.
This will require fixed input shapes, because X
and Y
must be defined, and if you add Dense
layers after the flatten, the Dense
layer will need a fixed number of features.