What is the purpose of the ROI layer in a Fast R-CNN?

Shamane Siriwardhana picture Shamane Siriwardhana · Apr 15, 2017 · Viewed 19.1k times · Source

In this tutorial about object detection, the fast R-CNN is mentioned. The ROI (region of interest) layer is also mentioned.

What is happening, mathematically, when region proposals get resized according to final convolution layer activation functions (in each cell)?

Answer

kmario23 picture kmario23 · Apr 16, 2017

Region-of-Interest(RoI) Pooling:

It is a type of pooling layer which performs max pooling on inputs (here, convnet feature maps) of non-uniform sizes and produces a small feature map of fixed size (say 7x7). The choice of this fixed size is a network hyper-parameter and is predefined.

The main purpose of doing such a pooling is to speed up the training and test time and also to train the whole system from end-to-end (in a joint manner).

It's because of the usage of this pooling layer the training & test time is faster compared to original(vanilla?) R-CNN architecture and hence the name Fast R-CNN.

Simple example (from Region of interest pooling explained by deepsense.io):

Visualization of RoI Pooling