Definition

Cluster-GCN is an algorithm for training GCN on large-scale graphs. The main idea of Cluster-GCN is to address the computational redundancy occurring in Neighbor Sampling, when nodes in a mini-batch share many neighbors.

Instead of sampling individual nodes or neighbors, Cluster-GCN samples small subgraphs from the large graph. It then performs efficient layer-wise node embedding updates over these subgraphs, where the subgraphs should retain edge connectivity structure of the original graph as much as possible.

If only single group is used per minibatch, the induced subgraph removes between-group links and the graph community detection algorithm puts similar nodes together in the same group. So the sampled nodes are not diverse enough to represent the entire graph structure. For this reason, the cluster-GCN aggregate multiple node groups per mini-batch.

Algorithm

  1. The given graph is partitioned into groups of nodes (subgraphs) using community detection algorithms such as Louvain or METIS algorithm.
  2. For each mini-batch, randomly sample multiple node groups and construct node-induced subgraph of the aggregated node group.
  3. Update the embeddings of the nodes of the subgraph.