Definition

U-Net is an encoder-decoder structured CNN used for image segmentation.

Architecture

Encoder-Decoder

The encoder extracts spatial patterns using regular convolutional layers. Each downsampling step doubles the number of channels. The decoder upsamples the input feature map with deconvolution that halves the number of channels.

Skip Connections

The feature maps from the layers of the encoder are cropped and concatenated with the upsampled feature maps in the layers of the decoder. This allows the network to combine low-level features with high-level features, preserving detail.