Definition
The t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualizing high-dimensional data. It models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points with high probability.
Algorithm
- Constructs a probability distribution over pairs of high-dimensional objects in such a way that similar objects are assigned a higher probability while dissimilar points are assigned a lower probability.
Define the conditional probability by Normal Distribution where acts as an adaptive bandwidth parameter for each point , and is indirectly determined by the given hyperparameter (perplexity).
And define the probability where is the number of data set
- Defines a similar probability distribution over the points in the low-dimensional map,
Define the similarities between two points in the low-dimensional map.
- Minimizes the Kullback-Leibler Divergence between the two distributions with respect to the locations of the points in the low-dimensional map.