Definition

Mask R-CNN extends Faster R-CNN by adding mask prediction branch for instance segmentation problems.
Architecture
RoI Align

Mask R-CNN replaces RoI pooling used in Faster R-CNN to RoI align. RoI align is performed to accurately align extracted features with the input image It computes the value of each sampling point by bilinear interpolation from the nearby grid points on the feature map.
Mask Head

Mask prediction head generates a binary mask for each RoI using the aligned feature.
The mask head is composed of convolutional layers and deconvolutional layers.