I am trying to understand the strides argument in tf.nn.avg_pool, tf.nn.max_pool, tf.nn.conv2d.
The documentation repeatedly says
strides: A list of ints that has length >= 4. The stride of the sliding window for each dimension of the input tensor.
My questions are:
tf.reshape(_X,shape=[-1, 28, 28, 1])
. Why -1?Sadly the examples in the docs for reshape using -1 don't translate too well to this scenario.
The pooling and convolutional ops slide a "window" across the input tensor. Using tf.nn.conv2d
as an example: If the input tensor has 4 dimensions: [batch, height, width, channels]
, then the convolution operates on a 2D window on the height, width
dimensions.
strides
determines how much the window shifts by in each of the dimensions. The typical use sets the first (the batch) and last (the depth) stride to 1.
Let's use a very concrete example: Running a 2-d convolution over a 32x32 greyscale input image. I say greyscale because then the input image has depth=1, which helps keep it simple. Let that image look like this:
00 01 02 03 04 ...
10 11 12 13 14 ...
20 21 22 23 24 ...
30 31 32 33 34 ...
...
Let's run a 2x2 convolution window over a single example (batch size = 1). We'll give the convolution an output channel depth of 8.
The input to the convolution has shape=[1, 32, 32, 1]
.
If you specify strides=[1,1,1,1]
with padding=SAME
, then the output of the filter will be [1, 32, 32, 8].
The filter will first create an output for:
F(00 01
10 11)
And then for:
F(01 02
11 12)
and so on. Then it will move to the second row, calculating:
F(10, 11
20, 21)
then
F(11, 12
21, 22)
If you specify a stride of [1, 2, 2, 1] it won't do overlapping windows. It will compute:
F(00, 01
10, 11)
and then
F(02, 03
12, 13)
The stride operates similarly for the pooling operators.
Question 2: Why strides [1, x, y, 1] for convnets
The first 1 is the batch: You don't usually want to skip over examples in your batch, or you shouldn't have included them in the first place. :)
The last 1 is the depth of the convolution: You don't usually want to skip inputs, for the same reason.
The conv2d operator is more general, so you could create convolutions that slide the window along other dimensions, but that's not a typical use in convnets. The typical use is to use them spatially.
Why reshape to -1 -1 is a placeholder that says "adjust as necessary to match the size needed for the full tensor." It's a way of making the code be independent of the input batch size, so that you can change your pipeline and not have to adjust the batch size everywhere in the code.