Definition

The t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualizing high-dimensional data. It models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points with high probability.

Algorithm

Constructs a probability distribution over pairs of high-dimensional objects in such a way that similar objects are assigned a higher probability while dissimilar points are assigned a lower probability.

Define the conditional probability by Normal Distribution $p_{j ∣ i} = \frac{e x p ( - ∣∣ x _{i} - x _{j} ∣ ∣ ^{2} /2 σ _{i}^{2} )}{k \neq = i \sum e x p ( - ∣∣ x _{i} - x _{k} ∣ ∣ ^{2} /2 σ _{i}^{2} )}, \forall i \neq = j$ where $σ_{i}$ acts as an adaptive bandwidth parameter for each point $i$ , and is indirectly determined by the given hyperparameter (perplexity).

And define the probability $p_{ij} = \frac{p _{j ∣ i} + p _{i ∣ j}}{2 N}$ where $n$ is the number of data set

Defines a similar probability distribution over the points in the low-dimensional map,

Define the similarities between two points in the low-dimensional map. $q_{ij} = \frac{( 1 + ∣∣ y _{i} - y _{j} ∣ ∣ ^{2} ) ^{- 1}}{k \neq = i \sum ( 1 + ∣∣ y _{k} - y _{l} ∣ ∣ ^{2} ) ^{- 1}}, \forall i \neq = j$

Minimizes the Kullback-Leibler Divergence between the two distributions with respect to the locations of the points in the low-dimensional map.

$y argmin K L (P ∣∣ Q) = i \neq = j \sum p_{ij} ln (\frac{p _{ij}}{q _{ij}})$

My Knowledge Base

Explorer

t-Distributed Stochastic Neighbor Embedding

Definition

Algorithm

Graph View

Table of Contents

Backlinks