Definition

Visual Attention model applies Attention spatially, the hidden state of LSTM ( $x_{t - 1}$ : 1, 1, channels) is used as a query, and the CNN feature map ( $X_{t}$ : height, width, channels) are used as a key and value. The result of the attention ( $x_{t}$ : 1, 1, channels) is used as the hidden state of the next step.

My Knowledge Base

Explorer

Visual Attention

Definition

Graph View

Backlinks