Definition

Single shot multi-box detector (SSD) model is a proposal-free object detection model that uses multiple feature maps from different layers to detect various-sized objects.

Architecture

The number of detections:

SSD uses a set of default boxes with different scales and aspect ratios at each location in the feature maps. CNN (Convolutional predictors) are applied to feature maps to produce class scores and box offsets for each default box.

Convolutional predictors

The convolutional predictors in SSD are designed to predict class scores and the bounding box offsets. The classification filter has the shape , where is the number of input channels, is the number of classes, and is the number of default boxes per location.