Definition
Graph attention network (GAT) applied Attention mechanism to the aggregation stage of GraphSAGE model. Instead of assigning same weights to all neighbors, GAT uses attention coefficient between nodes.
Architecture

Attention
The attention values are calculated as where and are learnable parameters shared across all nodes.
Aggregation
The feature of node at -th stage is calculated with attention where is a non-linear Activation Function
Multi-Head Attention
To stabilize the learning process, GAT employs multi-head attention. The independent attentions are executed with the same input, and the output of the output features are concatenated or averaged. where , and is the number of attention heads.