Definition

The t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualizing high-dimensional data. It models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points with high probability.

Algorithm

  1. Constructs a probability distribution over pairs of high-dimensional objects in such a way that similar objects are assigned a higher probability while dissimilar points are assigned a lower probability.

Define the conditional probability by Normal Distribution where acts as an adaptive bandwidth parameter for each point , and is indirectly determined by the given hyperparameter (perplexity).

And define the probability where is the number of data set

  1. Defines a similar probability distribution over the points in the low-dimensional map,

Define the similarities between two points in the low-dimensional map.

  1. Minimizes the Kullback-Leibler Divergence between the two distributions with respect to the locations of the points in the low-dimensional map.