Definition

Graph SAmple and aggreGatE (GraphSAGE) is a Node Embedding method sampling and aggregating features from each node’s local neighborhood. It can be generalized to unseen data.

Architecture

GraphSAGE does not require the adjacency matrix of entire graph used in GCN, it utilizes fixed number of neighbors for each node.

Aggregators

Mean aggregator:

LSTM aggregator:

(v_{i})\})$$ where the order of the feature sequence is random Pooling aggregator: $$\operatorname{AGG}_{k}^{\text{pool}} = \max(\{\operatorname{MLP}(h^{(k-1)}_{j})|v_{j}\in \mathcal{N}_{k} (v_{i})\})$$ # Algorithm 1. For $k=1,\dots, K$: 1. For each node $v_{i} \in V$: 1. Sample a fixed-size set of neighbors $\mathcal{N}_{k}(v_{i})$. 2. Aggregate the features of the sampled neighbors using an aggregator function (mean, [[Long Short-Term Memory|LSTM]], pooling). $$h^{(k)}_{\mathcal{N}(v_{i})} = \operatorname{AGG}_{k}(\{h^{(k-1)}_{j}|v_{j}\in \mathcal{N}_{k}(v_{i})\})$$ 3. Combine the aggregated neighborhood information with the node's own features and apply a non-linear [[Activation Function]]. $$h_{i}^{(k)} = \sigma(W^{(k)} \operatorname{CONCAT}(h_{i}^{(k-1)}, h_{\mathcal{N}(v_{i})}^{(k)}))$$ where $\sigma$ is a non-linear [[Activation Function]], and $h_{i}^{(k)}$ is a feature of node $v_{i}$ at $k$-th stage 2. Normalize the feature embedding $$h_{i}^{(k)} = \frac{h_{i}^{(k)}}{||h_{i}^{(k)}||_{2}},\forall v_{i}\in V$$ 2. After $K$ iterations, the final vectors are the output node embeddings. $$z_{i} = h_{i}^{(k)},\forall v_{i} \in V$$