What's the difference between those two? It would also help to explain in the more general context of convolutional networks.
Also, as a side note, what is channels? In other words, please break down the 3 terms for me: channels vs filters vs kernel.
Each convolution layer consists of several convolution channels (aka. depth or filters). In practice, they are a number such as 64, 128, 256, 512
etc. This is equal to number of channels in the output of a convolutional layer. kernel_size
, on the other hand, is the size of these convolution filters. In practice, they take values such as 3x3
or 1x1
or 5x5
. To abbreviate, they can be written as 1
or 3
or 5
as they are mostly square in practice.
Edit
Following quote should make it more clear.
Suppose X
is an input with size W x H x D x N
(where N
is the size of the batch) to a convolutional layer containing filter F
(with size FW x FH x FD x K
) in a network.
The number of feature channels D
is the third dimension of the input X
here (for example, this is typically 3 at the first input to the network if the input consists of colour images).
The number of filters K
is the fourth dimension of F
.
The two concepts are closely linked because if the number of filters in a layer is K
, it produces an output with K feature channels. So the input to the next layer will have K
feature channels.
The FW x FH
above is filter size you are looking for.
Added
You should be familiar with filters. You can consider each filter to be responsible for extracting some type of feature from a raw image. The CNNs try to learn such filters i.e. the filters parametrized in CNNs are learned during training of CNNs. You apply each filter in a Conv2D to each input channel and combine these to get output channels. So, the number of filters and the number of output channels are the same.