Definition

Regions with CNN features (RCNN) model performs object detection in two stages: region proposal and object recognition.

Region Proposal

The region proposal is performed by the off-the-shelf model. Around 200 regions are proposed.

Object Recognition

Extract CNN feature of the proposed image patch and map the features to labels using classifier (SVM).

Bounding-Box Regression

After the classifier, the bounding-box regressor is applied. The regressor takes the CNN features of the proposed region and predicts a refined bounding box. The refinement aims to adjust the original region proposal to better fit the actual object boundaries. A separate bounding-box regressor is trained for each object class.