Definition

Graph attention network (GAT) applied Attention mechanism to the aggregation stage of GraphSAGE model. Instead of assigning same weights to all neighbors, GAT uses attention coefficient between nodes.

Architecture

Attention

The attention values are calculated as where and are learnable parameters shared across all nodes.

Aggregation

The feature of node at -th stage is calculated with attention where is a non-linear Activation Function

Multi-Head Attention

To stabilize the learning process, GAT employs multi-head attention. The independent attentions are executed with the same input, and the output of the output features are concatenated or averaged. where , and is the number of attention heads.