Definition

Transformer-based GNN address some of the limitations of traditional GNNs (such as cycle counting) while leveraging the Transformer architecture for processing Graph-structured data.

Architecture

Node Embedding

Each node is embedded into a vector using their features.

Positional Encoding

Unlike in standard Transformer where positional encodings represent sequential order, in graph scenarios, these encodings are used to capture the structural information of the graph. It is constructed from the Adjacency Matrix of the graph.

There exists various approaches constructing positional encoding vectors.

Relative Distance

This method directly using the idea of Position-Aware GNN. The distance of a given target node to each anchor-set as a positional encoding.

Laplacian Eigenvectors

Calculate the eigenvector matrix of the Laplacian Matrix, and use each row of the eigenvector matrix as a positional encoding of each node.

The signs of the eigenvectors may change model’s prediction. SigNet uses a neural network to get sign-invariant positional encoding to prevent this problem. $f (z_{1}, z_{2}, \dots, z_{k}) = ρ (ϕ (z_{1}) + ϕ (- z_{1}), \dots, ϕ (z_{k}) + ϕ (- z_{k}))$ where $z_{i}$ is the $i$ -th Eigenvector of the Laplacian Matrix, and $ρ$ and $ϕ$ are neural network (MLP, GNN, etc.)

Self Attention

The edge features are used for adjusting the attention weights $[k_{ij}] = (\frac{K ^{⊺} Q}{d})$ of the nodes.

If there is edge between nodes $i$ and $j$ with features $e_{ij}$ , it is linearly transformed $c_{ij} = W_{1}^{⊺} e_{ij}$ If there is no edge, find the shortest edge path $(e^{1}, e^{2}, \dots, e^{n})$ between $i$ and $j$ , and define $c_{ij} = i = 1 \sum n W_{i}^{⊺} e^{i}$ Then, it is added to the corresponding attention weight. $a_{ij} = k_{ij} + c_{ij}$ where $W_{1}, \dots, W_{n}$ are learnable parameters.

My Knowledge Base

Explorer

Transformer-Based GNN