Definition
Negative sampling (NS) is a simplified version of NCE. It shares the core idea of transforming the problem of density estimation into binary classification between true data samples and noise samples.
In the context of word embeddings, particularly the Skip-gram model, NS aims to learn good word representations without the need to estimate a full probability distribution. The goal is to find parameters that best represent the relationship between words and their contexts. Given a word-context pair from the true data distribution, NS samples negative examples (noise) from a noise distribution . The objective is to maximize the probability of the true pair while minimizing the probability of the noise pairs.
NS models with the Sigmoid Function:
The NS objective function is derived as: where is the set of negative samples drawn from the noise distribution.
The NS objective function is similar to the Skip-gram objective but replaces the expensive Softmax Function with a simpler binary classification task between true and noise samples.