Definition

Negative sampling (NS) is a simplified version of NCE. It shares the core idea of transforming the problem of density estimation into binary classification between true data samples and noise samples.

In the context of word embeddings, particularly the Skip-gram model, NS aims to learn good word representations without the need to estimate a full probability distribution. The goal is to find parameters $θ$ that best represent the relationship between words and their contexts. Given a word-context pair $(w, c)$ from the true data distribution, NS samples $k$ negative examples (noise) from a noise distribution $p_{n} (c)$ . The objective is to maximize the probability of the true pair while minimizing the probability of the noise pairs.

NS models $p (D = 1∣ w, c; θ)$ with the Sigmoid Function: $p (D = 1∣ w, c; θ) = σ (v_{c} \cdot v_{w}) = \frac{1}{1 + e x p ( - v _{c} \cdot v _{w} )}$

The NS objective function is derived as: $L_{NS} = (w, c) \in D \sum ln σ (v_{c} \cdot v_{w}) + (w, c) \in D^{'} \sum k ln σ (- v_{c} \cdot v_{w})$ where $D^{'}$ is the set of negative samples drawn from the noise distribution.

The NS objective function is similar to the Skip-gram objective but replaces the expensive Softmax Function with a simpler binary classification task between true and noise samples.

My Knowledge Base

Explorer

Negative Sampling

Definition

Graph View

Backlinks