Definition

Segmenter is a Transformer-based semantic segmentation model.

Architecture

The encoder of Segmenter has a ViT architecture.

The decoder takes the output of the encoder and the class tokens as inputs. The embedded token matrix () and the class mask matrix () are multiplied and Softmax Function is applied on the class dimension to obtain pixel-wise class scores, where is the number of patches, is the channel dimension, is the number of classes.