Defining model in keras (include_top = True)

AKSHAYAA VAIDYANATHAN picture AKSHAYAA VAIDYANATHAN · Sep 4, 2017 · Viewed 10.1k times · Source

Can somebody tell me what include_top= True means when defining a model in keras?

I read the meaning of this line in Keras Documentation. It says include_top: whether to include the fully-connected layer at the top of the network.

I am still looking for an intuitive explanation for this line of code.

ResNet50(include_top=True)

Thanks!

Answer

Daniel Möller picture Daniel Möller · Sep 4, 2017

Most of these models are a series of convolutional layers followed by one or a few dense (or fully connected) layers.

Include_top lets you select if you want the final dense layers or not.

  • the convolutional layers work as feature extractors. They identify a series of patterns in the image, and each layer can identify more elaborate patterns by seeing patterns of patterns.

  • the dense layers are capable of interpreting the found patterns in order to classify: this image contains cats, dogs, cars, etc.

About the weights:

  • the weights in a convolutional layer are fixed-size. They are the size of the kernel x filters. Example: a 3x3 kernel of 10 filters. A convolutional layer doesn't care about the size of the input image. It just does the convolutions and present a resulting image based on the size of the input image. (Search for some illustrated tutorials about convolutions if this is unclear)

  • now the weights in a dense layer are totally dependent on the input size. It's one weight per element of the input. So this demands that your input be always the same size, or else you won't have proper learned weights.

Because of this, removing the final dense layers allows you to define the input size (see in documentation). (And the output size will increase/decrease accordingly).

But you lose the interpretation/classification layers. (You can add your own, depending on your task)


Extra info on Poolings and Flatten

Global poolings:

After the last convolutional layers, your outputs are still like images. They have shape (images, X, Y, channels), where X and Y are spatial dimensions of a 2D image.

When your model has GlobalMaxPooling2D or GlobalAveragePooling2D, it will eliminate the spatial dimensions. With Max it will take only the highest value pixel for each channel. With Average it will take the mean value of each channel. The result will be just (images, channels), without spatial dimensions anymore.

  • Advantage: since the spatial dimension is discarded, you can have variable size images
  • Disadvantage: you loose a lot of data if you still have big sizes. (This might be ok depending on the model and data)

Flatten

With flatten, the spatial dimensions will not be lost, but they will be transformed in features. From (images, X, Y, channels) to (images, X*Y*channels).

This will require fixed input shapes, because X and Y must be defined, and if you add Dense layers after the flatten, the Dense layer will need a fixed number of features.