What is the difference between performing upsampling together with strided transpose convolution and transpose convolution with stride 1 only?

Question 1

What is the difference between performing upsampling together with strided transpose convolution and transpose convolution with stride 1 only?

deep-learning keras conv-neural-network convolution deconvolution

Aleksandar Jovanovic · Jan 12, 2018 · Viewed 7.5k times · Source

Answer

Answer

Here and here you can find a really nice explanation of how transposed convolutions work. To sum up both of these approaches:

In your first approach, you are first upsampling your feature map:
```
[[1, 2], [3, 4]] -> [[1, 1, 2, 2], [1, 1, 2, 2], [3, 3, 4, 4], [3, 3, 4, 4]]
```
and then you apply a classical convolution (as Conv2DTranspose with stride=1 and padding='same' is equivalent to Conv2D).
In your second approach you are first un(max)pooling your feature map:
```
[[1, 2], [3, 4]] -> [[1, 0, 2, 0], [0, 0, 0, 0], [3, 0, 4, 0], [0, 0, 0, 0]]
```
and then apply a classical convolution with filter_size, filters`, etc.

Fun fact is that - although these approaches are different they share something in common. Transpose convolution is meant to be the approximation of gradient of convolution, so the first approach is approximating sum pooling whereas second max pooling gradient. This makes the first results to produce slightly smoother results.

Other reasons why you might see the first approach are:

Conv2DTranspose (and its equivalents) are relatively new in keras so the only way to perform learnable upsampling was using Upsample2D,
Author of keras - Francois Chollet used this approach in one of his tutorials,
In the past equivalents of transpose, convolution seemed to work awful in keras due to some API inconsistencies.

Question 2

I noticed in a number of places that people use something like this, usually in fully convolutional networks, autoencoders, and similar:

model.add(UpSampling2D(size=(2,2)))
model.add(Conv2DTranspose(kernel_size=k, padding='same', strides=(1,1))

I am wondering what is the difference between that and simply:

model.add(Conv2DTranspose(kernel_size=k, padding='same', strides=(2,2))

Links towards any papers that explain this difference are welcome.

What is the difference between performing upsampling together with strided transpose convolution and transpose convolution with stride 1 only?

Answer

Related questions