Definition

Show, attend, and tell (SAT) is a Attention based image captioning model.

Architecture

The model has LSTM structure. The feature map is extracted from the input image using CNN and it is fed into the Attention LSTM, where the query (Q) is the previous hidden state, and key (K) and value (V) are the vectors of the feature map.

My Knowledge Base

Explorer

Show, Attend, and Tell

Definition

Architecture

Graph View

Table of Contents

Backlinks