Manifold Learning

Topological Space
Definition

Topological Space

$(X, T)$ A topological space is a set $X$ along with an additional structure called a Topology $T$ .
Link to original

Open Set
Definition

$U \in T$

Consider a Topological Space $(X, T)$ . The elements of the Topology $T$ are called the open sets of $X$

Definition in Metric Topology

Consider a Metric Space $(X, T_{d}, d)$ $U \in T_{d} ⟺ \forall y \in U, \exists δ > 0 s.t. B_{d} (y, δ) \subset U$

Facts

Every point of $U \in T$ is an interior point of $U$ .

Consider a Topological Space $(X, T)$ and a subset $U \subset X$ . Then, $U$ is open in $X$ if and only if for an arbitrary point $x$ in $U$ , there exists an open subset $U_{x} \subset U$ containing $x$ . $U \in T ⟺ \forall x \in U, \exists U_{x} ∋ x s.t. U_{x} \in T and U_{x} \subset U$

$A$ is an open set if and only if it’s equal to its Interior $A = int A$ .

Link to original

Closed Set
Definition

Consider a Topological Space $(X, T)$ . A set $C$ is closed if $X ∖ C \in T$

The complement of an Open Set is called a closed set.

Facts

$\emptyset$ and $X$ are closed

$\forall closed {C_{i}}_{i \in I}, i \in I ⋂ C_{i} is closed$

$\forall closed {C_{i}}_{i = 1}^{n}, i = 1 ⋃ n C_{i} is closed$

Consider a Subspace Topology $(Y, T_{Y})$ of a Topological Space $(X, T)$ and a subset $A \subset Y$ . Then, $(Y ∖ A) \in T_{Y} ⟺ (\exists (X ∖ C) \in T s.t. A = C \cap Y)$

Consider a Subspace Topology $(Y, T_{Y})$ of a Topological Space $(X, T)$ and a subset $A \subset Y$ . Then, $((Y ∖ A) \in T_{Y} and (X ∖ Y) \in T) ⟹ X ∖ A \in T$

Consider a Topological Space $(X, T)$ and a subset $A \subset X$ . Then, $(X ∖ A) \in T ⟺ A^{'} \subset A$ where $A^{'}$ is the set of all limit points of $A$

Consider a Topological Space $(X, T)$ and a subset $A \subset X$ . If the subset is equal to its Closure, it is closed. $(X ∖ A) \in T ⟺ \overset{ˉ}{A} = A$

Link to original

Neighborhood
Definition

Consider a subset $A$ of a Topological Space $(X, T)$ . A neighborhood $N_{x}$ of a point $x \in X$ is a subset of $X$ including an open set $U \in T$ containing $x$ . $x \in U \subset N_{x} \subset T$
Link to original

Embedding
Definition

$f : (X, T_{X}) ↪ (Y, T_{Y})$ Embedding is a function between two topological spaces has the following properties:

$f : X \to f (X)$ is a Homeomorphism

$X \subset Y$

Embedding preserves the structure of a topological structure.
Link to original

Homeomorphism
Definition

A function $f : (X, T_{X}) \to (Y, T_{Y})$ between two topological spaces is a homeomorphism if it satisfies the following conditions:

$f$ is Bijective

$f$ and its inverse function $f^{- 1}$ are continuous ( $f, f^{- 1} \in C^{0}$ )

Equivalent: $U \in T_{X} \Leftrightarrow f (U) \in T_{Y}$

Homeomorphic

Consider the two topological spaces $(X, T_{X}), (Y, T_{Y})$ . If there exists a homeomorphism between $X$ and $Y$ , then $X$ is homeomorphic to $Y$
Link to original

Compactness
Definition

Consider a Topological Space $(X, T)$ . The space is compact if every open covering (a collection of open subsets ${U_{i}}_{i \in I}$ of $X$ whose union is $X$ ) of $X$ has a finite subcovering. $\forall {U_{i}}_{i \in I} \subset T s.t. ⋃_{i \in I} U_{i} = X, \exists J \subset I s.t. ∣ J ∣ < \infty and ⋃_{j \in J} U_{j} = X$

Examples

Consider a Standard Topology on real numbers $(R, U)$ . Then, the subspace Topologies

${\frac{1}{n} ∣ n \in N} \subset R$ is not compact

${0} \cup {\frac{1}{n} ∣ n \in N} \subset R$ is compact

$(0, 1] \subset R$ is not compact

$R$ is not compact

$[0, 1] \subset R$ is compact (closed and bounded)

Facts

Consider a Subspace Topology $(Y, T_{Y})$ of a Topological Space $(X, T)$ . Then, $Y$ is compact if and only if every covering of $Y$ by open sets in $X$ has a finite subcollection covering $Y$ . $\forall {U_{i}}_{i \in I} \subset T_{Y} s.t. ⋃_{i \in I} U_{i} = Y, \exists J \subset I s.t. ∣ J ∣ < \infty and ⋃_{j \in J} U_{j} = Y ⟺ \forall {U_{i}}_{i \in I} \subset T s.t. ⋃_{i \in I} U_{i} = Y, \exists J \subset I s.t. ∣ J ∣ < \infty and ⋃_{j \in J} U_{j} = Y$

A closed subspace of a compact space is compact.

A compact subspace of a Hausdorff Space is closed.

The image of a compact space under a continuous map is compact.

Consider a Bijective continuous function $f : (X, T_{X}) \to (Y, T_{Y})$ between two topological spaces. If $X$ is a compact space and $Y$ is a Hausdorff Space, then $f$ is a Homeomorphism

Tychonoff's Theorem
Definition

The Product Topology of any collection of compact topological spaces is compact.
Link to original

Consider a Topological Space $(X, T)$ . $X$ is compact if and only if for every collection ${C_{i}}_{i \in I} s.t. (X ∖ C_{i}) \in T$ of closed sets in $X$ that have the Finite Intersection Property, its entire intersection is a non-empty set $i \in I ⋂ C_{i} \neq = \emptyset$ .

Every closed interval in Real Number $R$ is compact.

Heine-Borel Theorem
Definition

Consider a Standard Topology on a Euclidean Space $(R^{n}, U)$ and a subset $A \subset R^{n}$ Then, $A$ is closed and bounded if and only if $A$ is compact
Link to original

Consider a Topological Space $(X, T)$ . If $X$ is compact then $X$ is Limit Point Compact

Consider a metrizable space $(X, T)$ . Then, $X$ is compact if and only if $X$ is Limit Point Compact if and only if $X$ is Sequentially Compact

Every Compact metrizable space $(X, T)$ has a countable basis.

Link to original

A Compact Hausdorff Space is a normal space.

Link to original

Consider a Hausdorff Space $(X, T)$ and disjoint Compact subsets $A, B \subset X s.t. A \cap B = \emptyset$ . Then, there exists disjoint open sets $U, V \in T s.t. U \cap V = \emptyset$ satisfying $A \subset U$ and $B \subset V$ .

Every compact space is Countably Compact.

Every Compact space has Bolzano–Weierstrass property.

Link to original

Every Compact space is locally compact.

Link to original

A Topological Space $(X, T)$ is Compact if $X$ has a basis $B$ such that every open covering of $X$ by elements of $S$ has a finite subcover.

Alexander Subbasis Theorem
Definition

A Topological Space $(X, T)$ is Compact if and only if $X$ has a Subbasis $S$ such that every open covering of $X$ by elements of $S$ has a finite subcover.
Link to original
Link to original

Heine-Borel Theorem
Definition

Consider a Standard Topology on a Euclidean Space $(R^{n}, U)$ and a subset $A \subset R^{n}$ Then, $A$ is closed and bounded if and only if $A$ is compact
Link to original

Product Topology
Definition

A product topology is the Cartesian product of a family of topological spaces

Let $i \in I$ be an index set, and $X_{i}$ be a Topological Space. The product topology is defined as $X := \prod_{i \in I} X_{i} = {x : I \to ⋃_{i \in I} X_{i} \forall i \in I, x_{i} \in X_{i}}$ where $x_{i}$ is the $i$ -th coordinate (component) of the function $x$ .

Facts

Consider bases $B_{X}, B_{Y}$ for the topologies on $X, Y$ . Then, $B_{X \times Y} = {B_{X} \times B_{Y} ∣ B_{X} \in B_{Y}, B_{Y} \in B_{Y}}$ is a basis for the topology on $X \times Y$ .

The countably infinite product space $R^{ω} = n \in Z \prod R$ of real numbers with the Product Topology is metrizable.

Consider a function $f : Y \to i \in I \prod X_{i}$ between a Topological Space and a product space. Then, the function $f$ is continuous if and only if each Composite Function $π_{i} \circ f$ of the function $f$ and the Projection Map $π_{i}$ is continuous. $f \in C^{0} ⟺ \forall i \in I, π_{i} \circ f \in C^{0}$

The Product Space of any collection of Hausdorff spaces is a Hausdorff space.

Link to original

The Product Space of any collection of connected spaces is a connected space.

Link to original

The Product Space of two separable spaces is a separable space.

Link to original

The Product Space of two first-countable spaces is a first-countable space.

Link to original

The Product Space of two second-countable spaces is a second-countable space.

Link to original

Tychonoff's Theorem
Definition

The Product Topology of any collection of compact topological spaces is compact.
Link to original
Link to original

Tychonoff's Theorem
Definition

The Product Topology of any collection of compact topological spaces is compact.
Link to original

Connected Space
Definition

A Topological Space $(X, T)$ is connected if it can not be represented as the union of two disjoint, non-empty, open or closed subsets. $∄ A, B \in T s.t. A, B \neq = \emptyset and A ⊎ B = X ⟺ ∄ (X ∖ A), (X ∖ B) \in T s.t. A, B \neq = \emptyset and A ⊎ B = X$ where $⊎$ is a disjoint union.

Or, equivalently, if $\emptyset, X$ are the only open and closed subsets.

Examples

Topologist’s sine curve

Consider a Subspace Topology $(X, T_{X})$ , where $X = {(x, sin (\frac{1}{x})) ∣0 < x \leq 1} \cup (0 \times [- 1, 1])$ , of a Standard Topology $(R^{2}, U)$ on real plane. The Topological Space $(X, T_{X})$ is connected, but not path connected.

Consider a set ${a, b}$ .

The Topological Space with discrete topology $(X, {\emptyset, a, b, X})$ is not connected.

The Topological Space with trivial topology $(X, {\emptyset X})$ is connected and path connected by a path $f = {a b [0, 1) 1$ .

Consider a Subspace Topology $(Q, T^{'})$ of a Standard Topology $(R, U)$ on real number. The Topological Space $(Q, T^{'})$ is not connected by a separation ${(Q \cap (- \infty, 2)), (Q \cap (2, \infty))}$

Consider a Subspace Topology $(X, T_{X})$ , where $X = {(x, y) ∣ y = 0} \cup {(x, \frac{1}{x} ∣ x > 0}$ , of a Standard Topology $(R^{2}, U)$ on real plane. The Topological Space $(X, T_{X})$ is not connected by a separation ${U, V}$ .

Facts

Every path connected space is a Connected Space.

Link to original

Consider a Subspace Topology $(Y, T_{Y})$ of a Topological Space $(X, T)$ . Then, $A, B \in T_{Y} s.t. A, B \neq = \emptyset and A ⊎ B = Y$ is a separation of $Y$ if and only if $A ⊎ B = Y and A^{'} \cap B = A \cap B^{'} = \emptyset$ . where $⊎$ is a disjoint union, $A^{'}$ is the set of all limit points of $A$ .

Consider a Subspace Topology $(Y, T_{Y})$ of a Topological Space $(X, T)$ and a separation ${C, D}$ of $X$ . If $(Y, T_{Y})$ is a connected subspace, then $Y \subset C or Y \subset D$ .

Consider a collection of connected subspaces ${U_{i}}_{i \in I} s.t. U_{i} \subset X$ of a Topological Space $(X, T)$ . If the subspaces have a point in common $\exists x \in X, s.t. x \in i \in I ⋂ U_{i} \Leftrightarrow i \in I ⋂ U_{i} \neq = \emptyset$ , then the union of the collection $i \in I ⋃ U_{i}$ is connected.

Consider a collection of connected subspaces ${U_{i}}_{i \in I} s.t. U_{i} \subset X$ of a Topological Space $(X, T)$ , and a connected subspace $B$ . If $\forall i \in I, U_{i} \cap B \neq = \emptyset$ , then $B \cup (i \in I ⋃ U_{i})$ is connected.

Consider a connected Subspace Topology $(A, T_{A})$ of a Topological Space $(X, T)$ . For some Subspace Topology $B$ , if $A \subseteq B \subseteq A^{'}$ , then $B$ is connected.

Consider two topological spaces $(X, T_{X}), (Y, T_{Y})$ . If $(X, T_{X})$ is connected and there exists a Continuous Function $f : X \to Y$ , then $Y$ is a connected space.

Consider a continuous function $f : (X, T_{X}) \to (Y, T_{Y})$ between two topological spaces. If $(X, T_{X})$ is connected, then $f (X)$ is a connected subspace in $(Y, T_{Y})$ .

Consider a Topological Space $(X, T)$ and a subspace $Y \subset X$ . If $Y$ is connected, then its Closure $\overset{ˉ}{Y}$ is also connected.

The Product Space of any collection of connected spaces is a connected space.

Link to original

Topological Manifold
Definition

$(M, T)$ A Locally Euclidean second-countable Hausdorff Space Topological Space $(M, T)$ is a topological manifold.

A one-dimensional manifold is called a curve, and a two-dimensional manifold is called a surface.

Facts

The Product Space of an $n$ -dimensional manifold and an $m$ -dimensional manifold is an $(n + m)$ -dimensional manifold.

Consider an $n$ -dimensional manifold without boundary $X$ and an $m$ -dimensional manifold with boundary $Y$ . Then, $\partial (X \times Y) = X \times \partial Y$

Link to original

Second-Countable Space
Definition

A second-countable space is a Topological Space whose Topology has a countable basis.

Facts

Every second-countable space is a Separable Space.

$n$ -dimension Euclidean Space $R^{2}$ where $n \in N$ , is a second-countable space.

Hilbert Space $H$ is a second-countable space.

The Product Space of two second-countable spaces is a second-countable space.

Every Second-Countable Space is a Lindelof space.

Link to original
Link to original

Hausdorff Space
Definition

A Topological Space $(X, T)$ is Hausdorff space if $\forall x, y \in X s.t. x \neq = y, \exists N_{x}, N_{y} s.t. N_{x} \cap N_{y} = \emptyset$ where $N_{x}$ and $N_{y}$ are the neighbourhoods of $x$ and $y$ respectively.

Facts

Every finite point set in Hausdorff space is closed ( $∵$ T1 Axiom).

Consider a Topological Space $(X, T)$ satisfying $T_{2}$ axiom. Then, $X ∖ ⋃_{i = 1}^{n} x_{i} \in T$ where $x_{i} \in X$

Consider a $T_{2}$ Topological Space $(X, T)$ . Then, a Sequence of points of $X$ converges to at most one point of $X$ . $\forall (x_{n})_{i = 1}^{\infty} \subset X and \forall x, y \in X, (x_{n} \to x and x_{n} \to y) \Rightarrow x = y$

Every totally ordered set is a Hausdorff space in the Order Topology.

A Subspace Topology of a Hausdorff space is a Hausdorff space (Hereditary Property).

The Product Space of any collection of Hausdorff spaces is a Hausdorff space.

A Compact Hausdorff Space is a normal space.

Link to original
Link to original

Whitney Embedding Theorem
Definition

Any $m$ -dimensional smooth manifold $M$ can be embedded in $R^{2 m + 1}$
Link to original

Riemannian Manifold
Definition

A Riemannian manifold is a smooth Topological Manifold together with a Riemannian metric. Many geometric notions such as distance, angles, length, volume, and curvature are defined on a Riemannian manifold.

Let $M$ be a Smooth Manifold. For each point $p \in M$ , there is an associated vector space $T_{p} (M)$ called Tangent Space of $M$ at $p$ . Each Tangent Space $T_{p} (M)$ equipped with an Inner product $g_{p} : T_{p} (M) \times T_{p} (M) \to R$ defined by the basis of the tangent space. $g_{n \times n} = (g_{ij}), g_{ij} ∣_{p} := ⟨ \partial_{i}, \partial_{j} ⟩$ Where $⟨ \cdot, \cdot ⟩$ is a standard inner product of the ambient space (Euclidean inner product).

The collection of inner products $g = {g_{p} ∣ p \in M}$ is a Riemannian metric on $M$ , and the pair of the manifold $M$ and the Riemannian metric $g$ defines a Riemannian manifold. $(M, g)$
Link to original

Atlas
Definition

Atlas

An atlas on a Topological Manifold $M$ is a collection of charts which covers $M$ . ${(U_{α}, φ_{α}) ∣ α \in I}, s.t. ⋃_{α \in I} U_{α} = M$
Link to original

Chart
Definition

Let $M$ be a Topological Manifold. A chart for the manifold is a Homeomorphism $φ : U \subset M \to V \subset R^{n}$ , where $U$ is an open subset of the manifold, and $V$ is an open subset of the Euclidean Space. $(U, φ)$
Link to original

Differentiable Manifold
Definition

Let $M$ be a Topological Manifold. It is $k$ -times differentiable if the following condition holds: $\forall (U_{1}, φ_{1}), (U_{2}, φ_{2}), U_{1} \cap U_{2} \neq = \emptyset, φ_{2} \circ φ_{1}^{- 1} \in C^{k}$ where $C^{k}$ is Differentiability Class.
Link to original

Smooth Manifold
Definition

A smooth manifold is an infinitely Differentiable Manifold ( $C^{\infty}$ ).

Facts

The Product Space of an $n$ -dimensional smooth manifold and an $m$ -dimensional smooth manifold is an $(n + m)$ -dimensional smooth manifold.

Consider an $n$ -dimensional smooth manifold without boundary $X$ and an $m$ -dimensional smooth manifold with boundary $Y$ . Then, $\partial (X \times Y) = X \times \partial Y$

Link to original

Diffeomorphism
Definition

Given two differentiable manifolds $M$ and $N$ , a differentiable map $f : M \to N$ is a diffeomorphism if it is a Bijective and its inverse $f^{- 1}$ is differentiable as well.

If these functions are $r$ times continuously differentiable, $f$ is called $C^{r}$ -diffeomorphism

If there is a diffeomorphism $f : M \to N$ between two manifolds $M$ and $N$ , we call these two manifolds are diffeomorphic and denote as $M ≃ N$
Link to original

Tangent Space
Definition

Suppose that $M$ is a Differentiable Manifold and that $x \in M$ . Pick a coordinate Chart $φ : U \to R^{n}, φ (p) = (x_{1} (p), x_{2} (p), \dots, x_{n} (p))$ , where $U$ is an open subset of $M$ containing $x$ . The tangent space $T_{p} (M)$ is the set of all tangent vectors at $p$ on the manifold $M$ .

The basis of the tangent plane is given by the partial derivatives $\partial_{i} = \frac{\partial φ ^{- 1}}{\partial x _{i}}, i = 1, \dots, n$
Link to original

Integration of a Function over a Manifold
Definition

Consider a scalar function $f : M \to R$ and a Riemannian Manifold $(M, g)$ with a local Chart $φ : M \to R^{n}$ . The integral of $f$ over manifold $M$ can be calculated in a local coordinate space. $\int_{M} fd s = \int_{U} \tilde{f} ∣ g ∣ d x$ where $\tilde{f} := f \circ φ^{- 1}$ , $U = φ (M)$ , and $∣ g ∣$ is the absolute value of the Determinant of the inner product at $φ^{- 1} (x)$
Link to original

Isometry
Definition

Let two Riemannian manifolds $(M, g^{M})$ and $(N, g^{N})$ , and a Diffeomorphism $f : M \to N$ . Then $f$ is called an isometry if the following condition holds: $\forall p \in M, u, v \in T_{p} (M), g^{M} (u, v) = g^{N} (f (u), f (v))$
Link to original

Geodesic
Definition

A geodesic is a curve representing the shortest path between two points in a Riemannian Manifold.

Suppose a Riemannian Manifold $(M, g)$ . A differentiable curve in $M$ is defined as a smooth mapping $γ : R \to M$ from an open interval $(λ_{0}, λ_{1}) \subset R$ into $M$ . The distance of a smooth curve is given by Arc Length $L (γ) = \int_{λ_{0}}^{λ_{1}} g_{γ (t)} (γ^{'} (t), γ^{'} (t)) d t$ The distance between two points $p$ and $q$ on $M$ is defined as the Infimum of the length taken over all differentiable curves $γ : (λ_{0}, λ_{1}) \to M$ such that $γ (λ_{0}) = p$ and $γ (λ_{1}) = q$ . $d^{M (p, q)} = in f_{γ \in C (p, q)} L (γ)$ where $C (p, q)$ is the set of all differentiable curves in $M$ that join up the points $p$ and $q$ .
Link to original

Laplace-Beltrami Operator
Definition

The Laplace–Beltrami operator is the Divergence of the Gradient. $L (f) = div (\nabla f) = \frac{1}{∣ g ∣} i = 1 \sum n j = 1 \sum n \frac{\partial}{\partial x _{i}} (∣ g ∣ \cdot g^{ij} \cdot \frac{\partial f ~}{\partial x _{j}})$ where $\tilde{f} = (f \circ φ^{- 1})$ , and $∣ g ∣$ is the absolute value of the Determinant of the inner product $g$ at $φ^{- 1} (x)$

Calculations

Calculation of Gradient

Consider an $n$ -dimensional Riemannian Manifold $(M, g)$ with a local Chart $φ : M \to R^{n}$ , and a function $f : M \to R$ .

By the definition of the Gradient, $\nabla_{v} f (p) = g_{p} (v, \nabla f (p)) = i = 1 \sum n j = 1 \sum n g_{ij} v_{i} a_{j}$ where $g_{p}$ is the inner product at $p$ , and $v_{i}$ and $a_{j}$ are the $i$ -th and $j$ -th element of the vector $v$ and gradient vector $\nabla f (p)$ respectively.

By Chain Rule, The directional derivative of $f$ at a point $p \in M$ in the direction of $v$ is given by $\nabla_{v} f (p) = i = 1 \sum n v_{i} \frac{\partial f ~}{\partial x _{i}}$ where $\tilde{f} = (f \circ φ^{- 1})$

By combining these two equations, we can obtain the following equality. $j = 1 \sum n g_{ij} a_{j} = \frac{\partial f ~}{\partial x _{i}}, \forall i = 1, \dots, n$ $g a = \nabla \tilde{f}$ Let $g^{ij}$ be the inverse matrix of $g$ , then $a = g^{ij} \nabla \tilde{f}$ The gradient vector is $\nabla f (p) = i = 1 \sum n j = 1 \sum n g^{ij} \frac{\partial f ~}{\partial x _{j}} \partial_{i}$ where $\partial_{i} = \frac{\partial φ ^{- 1}}{\partial x _{i}}$

Calculation of Divergence

Consider an $n$ -dimensional Riemannian Manifold $(M, g)$ with a local Chart $φ : M \to R^{n}$ , and a function $f : M \to R$ .

By the Divergence Theorem, for a function $f$ with a compact support and a vector field $V$ , $⟨ V, \nabla f ⟩ = - ⟨ div V, f ⟩$ In a Euclidean Space, the integration of both sides are $\int_{R^{n}} f \cdot div V d x = - \int_{R^{n}} ⟨ \nabla f, V ⟩$ The integration can be performed over a coordinate Chart of the manifold.^[Integration of a Function over a Manifold] By the Integration by Parts,
$\int_{\mathbb{R}^{n}} \tilde{f} \cdot \operatorname{div}V \sqrt{|g|} dx &= - \int_{\mathbb{R}^{n}} g(\nabla f, V) \sqrt{|g|} dx\\ &= - \int_{\mathbb{R}^{n}} \sum\limits_{i=1}^{n} V_{i} \frac{\partial\tilde{f}}{\partial x_{i}} \sqrt{|g|} dx\\ &= - \int_{\mathbb{R}^{n}} \tilde{f} \sum\limits_{i=1}^{n} \frac{\partial}{\partial x_{i}} (V_{i} \sqrt{|g|}) dx \end{aligned}$$ where $\tilde{f} := f \circ \varphi^{-1}$, $U = \varphi(\mathcal{M})$, $|g|$ is the absolute value of the [[Determinant]] of the inner product $g$ at $\varphi^{-1}(x)$, and $V_{i}$ is the $i$-th element of the vector $V$. Therefore, we have $$\operatorname{div}V = \frac{1}{\sqrt{|g|}} \sum\limits_{i=1}^{n} \frac{\partial}{\partial x_{i}} (V_{i} \sqrt{|g|})$$$ Link to original

Linear Manifold Learning

Principal Component Analysis
Definition

PCA is a linear dimensionality reduction technique. The correlated variables are linearly transformed onto a new coordinate system such that the directions capturing the largest variance in the data.

Population Version

Given a random vector $x$ , we find a $α$ such that $Var (α^{⊺} x)$ is maximized: $α argmax Var (α^{⊺} x) s.t. α^{⊺} α = 1$ Equivalently, by the Method of Lagrange Multipliers with $α^{⊺} α = 1$ , $α argmax α^{⊺} Σ α - λ (α^{⊺} α - 1)$ By differentiation, the $α$ is given by the eigen value problem $Σ α = λ α$ Thus the $α$ maximizing the variance of $α^{⊺} x$ is the eigenvector corresponding to the largest Eigenvalue.

Sample Version

Given a data matrix $X$ , by Singular Value Decomposition, A matrix $X$ can be factorized as $X = UD V^{⊺}$ . By algebra, $XV = UD =: Z \Rightarrow X v_{i} = d_{i} u_{i} =: z_{i}$ , where we call $z_{i}$ the $i$ -th principal component.

Facts

Since $Var (z_{i}) = Var (X v_{i}) = \frac{d _{i}^{2}}{n}$ and $d_{1} \geq d_{2} \geq \dots \geq d_{p} \geq 0$ $Var (z_{1}) \geq Var (z_{2}) \geq \dots \geq Var (z_{p}) \geq 0$

Link to original

Robust Principal Component Analysis
Definition

PCA doesn’t work well when there are noises in the input data. The goal of Robust PCA is to find a matrix decomposition to a low rank matrix and a sparse matrix.

The optimization problem of Robust PCA is set up as $L, S argmin rank (L) + λ ∣∣ S ∣ ∣_{0} s.t. X = L + S$ where $L$ is a low-rank matrix, $S$ is a sparse matrix, and $∣∣ \cdot ∣ ∣_{0}$ is the number of non-zero elements in the matrix.

However, the optimization problem is infeasible because it is neither continuous nor convex. So, we solve the relaxed problem $L, S argmin ∣∣ L ∣ ∣_{*} + λ ∣∣ S ∣ ∣_{1} s.t. X = L + S$ where $∣∣ \cdot ∣ ∣_{*}$ is the Nuclear Norm of the matrix, and $∣∣ \cdot ∣ ∣_{1}$ is the matrix L1 norm.

Applications

Video Surveillance

Face Recognition

Latent Semantic Indexing

Collaborative Filtering (Matrix Completion)

Link to original

Nonlinear Manifold Learning

Kernel Principal Component Analysis
Definition

Consider an $n \times p$ demeaned matrix $X$ By SVD $X = UD V^{⊺}$ , where $U, V$ are orthonormal matrices and $D$ is a Diagonal Matrix. Then, $X X^{⊺} = UD V^{⊺} (UD V^{⊺})^{⊺} = U D^{2} U^{⊺}$ By PCA $Z := XV = UD$ . For a linear kernel $K = X X^{⊺}$ , $K = Z D U^{⊺} \Rightarrow K (D U^{⊺})^{- 1} = Z$ . $∵ U$ is an Orthonormal Matrix, $Z = KU D^{- 1} \Leftrightarrow z_{im} = j = 1 \sum n \frac{u _{jm}}{d _{m}} K (x_{i}, x_{j})$ The kernel principal components are given by solution of the optimization problem. $g_{i} \in H_{K} max Var g_{i} (X) subject to ∣∣ g_{i} ∣ ∣_{H_{K}} = 1, \forall j < i, ⟨ g_{i}, g_{j} ⟩_{H_{K}} = 0$ where $g_{i} (x) = j = 1 \sum n \frac{u _{ji}}{d _{i}} K (x, x_{j})$
Link to original

Self-Organizing Map
Definition

A self-organizing map (SOM) is a clustering method that produces a low-dimensional representation of a higher-dimensional data set while preserving the Topological Manifold structure of the data.

The algorithm fits a grid to high-dimensional data and assigns the data to the fitted nodes (prototypes) of the grid.
Link to original

Isomap
Definition

Isomapis a non-linear dimensional reduction method that aims to preserve the intrinsic geometry of high-dimensional data in a lower-dimensional space. The main idea of isomap is to approximate the geodesic distances between data points on a manifold, rather than just using straight-line Euclidean distances.

Algorithm

Determine the neighbors of each point

All points in some fixed radius

k-nearest neighbors

Construct a neighborhood distnace graph

Compute the shortest path between two nodes with Dijkstra’s algorithm.

Link to original

Locally Linear Embedding
Definition

Locally linear embedding (LLE) is a non-linear dimensional reduction method that aims to preserve the intrinsic geometry of high-dimensional data in a lower-dimensional space.

Algorithm

Determine the k-nearest neighbors of each point

Compute a set of weights for each point that best approximates the point as a linear combination of its neighbors. $W argmin i = 1 \sum n [x_{i} - j : x_{i} \in N_{x_{i}} \sum W_{ij} x_{j}]^{2} subject to j : x_{i} \in N_{x_{i}} \sum W_{ij} = 1, \forall i = 1, \dots, n$ where $x_{i}$ is a data point vector, $W$ is $n \times k$ matrix, and $N_{x_{i}}$ is the set of the neighbors of $x_{i}$

Find the low-dimensional embedding of points by solving Eigenvalue problem. Where each point is still described with the same linear combination of its neighbors. $y argmin i = 1 \sum n [y_{i} - j : x_{i} \in N_{x_{i}} \sum W_{ij} y_{j}]^{2} subject to \frac{1}{n} y^{⊺} y = I$ where $y_{i}$ is a vector has lower dimension than $x_{i}$ , and $W$ is the matrix obtained by the step 2.

In a matrix notation, the problem is written as $y argmin y^{⊺} M y subject to \frac{1}{n} y^{⊺} y = I$ where $M = (I - W)^{⊺} (I - W)$

By the Method of Lagrange Multipliers, the solution of the optimization problem is given as $M y = y Λ$ Therefore, $y$ consists of eigenvectors corresponding to the $p$ -smallest eigenvalues.

Link to original

Multidimensional Scaling
Definition

Multidimensional Scaling (MDS) is a non-linear dimensional reduction method. It aims to represent high-dimensional data in a lower-dimensional space while preserving the pairwise distances

Algorithm

MDS

Find a low-dimensional $y$ that minimizes the difference in the pairwise distance of points. $y argmin i = 1 \sum n j \neq = i = 1 \sum n [d (x_{i}, x_{j}) - d (y_{i}, y_{j})]^{2}$ where $x_{i}$ is a data point vector, $d (\cdot, \cdot)$ is a distance function.

Local MDS

Build a set of nearby pairs $N = {(i, j) ∣ x_{j} \in N_{i} or x_{i} \in N_{j}}$ where $N_{i}$ is the set of the k-nearest neighbors of $x_{i}$

Find a low-dimensional $y$ that minimizes the difference in the local pairwise distance of points. $y argmin (i, j) \in N \sum [d (x_{i}, x_{j}) - d (y_{i}, y_{j})]^{2} + (i, j) \in / N \sum λ [D - d (y_{i}, y_{j})]^{2}$ where $x_{i}$ is a data point vector, $d (\cdot, \cdot)$ is a distance function, $D$ is a matrix with very large value, and $λ$ is a small value weight often $\frac{1}{D}$ .

Link to original

Adjacency Matrix
Definition

Consider a Graph with $n$ nodes $X_{1}, X_{2}, \dots, X_{n}$ .

Unweighted Adjacency Matrix

An unweighted adjacency matrix $A$ is $n \times n$ matrix such that its element $a_{ij}$ is one when there is edge from node $X_{i}$ to $X_{j}$ and zero when there is no edge.

Weighted Adjacency Matrix

An unweighted adjacency matrix $A$ is $n \times n$ matrix such that its element $a_{ij}$ is the weight of the edge between $X_{i}$ and $X_{j}$ , and is zero when there is no edge.

Examples

Given graph

Adjacency matrix $A = 210010101010010100001011110100000100$

Facts

If a graph is undirected, then the adjacency matrix is Symmetric Matrix
Link to original

Degree Matrix
Definition

The degree matrix of an undirected Graph is a Diagonal Matrix representing the sum of the weight of edges connected to each node. It is used together with the Adjacency Matrix to construct the Laplacian Matrix of a graph.

Given a graph $G$ and the corresponding Adjacency Matrix $A = (a_{ij})$ . The degree matrix is defined as $D := diag (d_{1}, d_{2}, \dots, d_{n})$ where $d_{i} := j = 1 \sum n a_{ij}$ ,

Examples

Given graph

Degree matrix $D = 400000030000002000000300000030000001$
Link to original

Laplacian Matrix
Definition

The Laplacian matrix is a matrix representation of a Graph. $L = D - A$ where $D$ is the Degree Matrix, and $A$ is the Adjacency Matrix of the graph.

Random Walk Normalized Laplacian Matrix

$L^{rw} = D^{- 1} L$

Symmetrically Normalized Laplacian Matrix

$L^{sym} = D^{- 1/2} L D^{- 1/2}$

Examples

Laplacian Matrix for Simple Graph

Adjacency Matrix and Degree Matrix $A = 010010101010010100001011110100000100, D = 200000030000002000000300000030000001$

Laplacian matrix $L = D - A = 2 - 1 00 - 1 0 - 1 3 - 1 0 - 1 0 0 - 1 2 - 1 00 00 - 1 3 - 1 - 1 - 1 - 1 0 - 1 30 000 - 1 01$

Laplacian Matrix for Graph with Weighted Edges

Adjacency Matrix and Degree Matrix $A = 0123100020043040, D = 6000010000600007,$

Laplacian matrix $L = D - A = 6 - 1 - 2 - 3 - 1 100 - 2 06 - 4 - 3 0 - 4 7$

Facts

The number of zero-eigenvalues of the Laplacian matrix equals the number of connected clusters in the graph.

Analogousness to the Laplace-Beltrami Operator

The Laplacian matrix on a graph, $L$ can be ragarded as a discrete approximation to the Laplace-Beltrami Operator on a manifold.

The quadratic form of the Laplacian Matrix can be seen as a discretization of squared gradient on manifold.

$z^{⊺} Lz = \frac{1}{2} i = 1 \sum n j = 1 \sum n a_{ij} (z_{i} - z_{j})^{2} = \frac{1}{2} i = 1 \sum n j = 1 \sum n (\frac{z _{i} - z _{j}}{d _{ij}}) \approx \int_{M} ∣∣\nabla f (x) ∣ ∣^{2} = \int_{M} L (f) \cdot f$ where $d_{ij}^{2} := \frac{1}{a _{ij}}$ is a distance between two points $x_{i}$ and $x_{j}$ defined by the Adjacency Matrix.
Link to original

Laplacian Eigenmap
Definition

The Laplacian eigenmap is an embedding preserving local information optimally.

Algorithm

Consider a set of data points $x_{1}, x_{2}, \dots, x_{n} \in R^{l}$ . Make an Adjacency Matrix with Gaussian Radial Basis Function Kernel $A = (a_{ij}) := exp (- ∣∣ x_{i} - x_{j} ∣ ∣^{2} / c)$ where $c > 0$ is a scale parameter.

Construct a Laplacian Matrix using the Adjacency Matrix. The following options are available.

Unnormalized Laplacian Matrix $L = D - A$

Random Walk Normalized Laplacian Matrix $L^{rw} = D^{- 1} L$

Symmetrically Normalized Laplacian Matrix $L^{sym} = D^{- 1/2} L D^{- 1/2}$

Construct the matrix $Z = [z_{1}, z_{2}, \dots, z_{m}]$ whose columns are eigenvectors corresponding to the $m$ -smallest eigenvalues of the Laplacian Matrix. It is the result of Laplacian eigenmap $x_{i} \in R^{l} \to [z_{2}, z_{3}, \dots, z_{m}]_{i} \in R^{m}$ where $m \leq l$ , and $z_{1}$ is intentionally omitted because it corresponds to the trivial solution (constant vector).
Link to original

Spectral Clustering
Definition

Spectral clustering technique make use of spectrum of the similarity matrix of the data to perform dimensionality reduction before clustering.

Algorithms

Apply Laplacian Eigenmap to the given data $x_{1}, x_{2}, \dots, x_{n} \in R^{l}$ . $x_{i} \in R^{l} \to [z_{2}, z_{3}, \dots, z_{m}]_{i} \in R^{m}$

Form an $n \times k$ sub matrix $T$ using the first $k$ -columns of the embedded matrix $Z$ , and normalize each row to norm $1$ . $T = [z_{1}, z_{2}, \dots, z_{k}]$

Apply clustering algorithm (e.g. K-Means Clustering) to $t_{1}, t_{2}, \dots, t_{n}$ , where $t_{i}$ is the $i$ -th row of $T$ .

Mathematical Background

Consider a cluster assignment vector $z_{i}$ for each data point. Then, clustering $n$ observations is corresponding to estimating $z_{i}$ ‘s The entries $a_{ij}$ of the Adjacency Matrix represents the similarity between points $i$ and $j$ .

Under the assumption that the close data points (large $a_{ij}$ ) have a similar label ( $z_{i} \approx z_{j}$ ), the optimization problem is set up that minimizing the difference between assignments for similar points. $Z argmin \frac{1}{2} i = 1 \sum n j = 1 \sum n a_{ij} (z_{i} - z_{j})^{2} subject to Z^{⊺} Z = I$ where the constraint is imposed to avoid the trivial solution $z_{i} = 0$ .

In a matrix notation, the problem is $Z argmin tr (Z^{⊺} LZ) subject to Z^{⊺} Z = I$

When $z_{i} \approx z_{j}$ , $a_{ij}$ is large and when $z_{i} \neq = z_{j}$ , $a_{ij}$ is small. So, $a_{ij}$ works as the weight for each pair in the optimization.

The Lagrangian function is defined as $L_{P} = Z^{⊺} LZ - Λ (Z^{⊺} Z - 1)$ And the solution of the problem is given by an Eigenvalue problem. $\frac{\partial L _{p}}{\partial Z} = 2 LZ - 2Λ Z = 0 \Rightarrow LZ = Λ Z \Rightarrow Z^{⊺} LZ = Λ$ The solution $Z$ is an $n \times n$ matrix of eigenvectors sorted in ascending by their corresponding eigenvalues.
Link to original

t-Distributed Stochastic Neighbor Embedding
Definition

The t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualizing high-dimensional data. It models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points with high probability.

Algorithm

Constructs a probability distribution over pairs of high-dimensional objects in such a way that similar objects are assigned a higher probability while dissimilar points are assigned a lower probability.

Define the conditional probability by Normal Distribution $p_{j ∣ i} = \frac{e x p ( - ∣∣ x _{i} - x _{j} ∣ ∣ ^{2} /2 σ _{i}^{2} )}{k \neq = i \sum e x p ( - ∣∣ x _{i} - x _{k} ∣ ∣ ^{2} /2 σ _{i}^{2} )}, \forall i \neq = j$ where $σ_{i}$ acts as an adaptive bandwidth parameter for each point $i$ , and is indirectly determined by the given hyperparameter (perplexity).

And define the probability $p_{ij} = \frac{p _{j ∣ i} + p _{i ∣ j}}{2 N}$ where $n$ is the number of data set

Defines a similar probability distribution over the points in the low-dimensional map,

Define the similarities between two points in the low-dimensional map. $q_{ij} = \frac{( 1 + ∣∣ y _{i} - y _{j} ∣ ∣ ^{2} ) ^{- 1}}{k \neq = i \sum ( 1 + ∣∣ y _{k} - y _{l} ∣ ∣ ^{2} ) ^{- 1}}, \forall i \neq = j$

Minimizes the Kullback-Leibler Divergence between the two distributions with respect to the locations of the points in the low-dimensional map.

$y argmin K L (P ∣∣ Q) = i \neq = j \sum p_{ij} ln (\frac{p _{ij}}{q _{ij}})$
Link to original

UMAP
Definition

Uniform manifold approximation and projection (UMAP) is a nonlinear dimensionality reduction algorithm similar to t-SNE. However, unlike to t-SNE, UMAP uses spectral embedding to initialize the low-dimensional graph, and uses a graph that only connected to each points’ nearest neighbors.

Algorithm

For each point in high-dimensional space, find its nearest neighbors.

Compute the distances between the nearest neighbors for each point.

$d (x_{i}, x_{j})$

Calculate the local connectivity parameter $σ_{i}$ for each point using the equation $j = 1 \sum k exp (- \frac{m a x ( 0 , d ( x _{i} , x _{j} ) - ρ _{i} )}{σ _{i}}) = lo g_{2} (k)$ where $k$ is the number of nearest neighbors, and $ρ_{i}$ is the minimum distance among the nearest neighbors

Calculate the edge weights to the nearest neighbors and symmetrize the graph

$v_{i ∣ j} = exp (- \frac{m a x ( 0 , d ( x _{i} , x _{j} ) - ρ _{i} )}{σ _{i}})$ $v_{ij} = v_{ji} = v_{i ∣ j} + v_{j ∣ i} - v_{i ∣ j} v_{j ∣ i}$

Initialize the low-dimensional representation using spectral embedding.

optimize the low-dimensional representation by minimizing both the attractive and the repulsive force using SGD.

The cost function is defined as $\sum_{e \in E} attractive force w_{h} (e) lo g (\frac{w _{h} ( e )}{w _{l} ( e )}) + repulsive force (1 - w_{h} (e)) lo g (\frac{1 - w _{h} ( e )}{1 - w _{l} ( e )})$ where $E$ is the set of edges, and $w_{h} (e)$ and $w_{l} (e)$ are the weights of the edge of the high-dimensional and low-dimensional graph respectively.
Link to original

My Knowledge Base

Explorer

Manifold Learning Note

Manifold Learning

Topological Space

Definition

Topological Space

Open Set

Definition

Definition in Metric Topology

Facts

Closed Set

Definition

Facts

Neighborhood

Definition

Embedding

Definition

Homeomorphism

Definition

Homeomorphic

Compactness

Definition

Examples

Facts

Tychonoff's Theorem

Definition

Heine-Borel Theorem

Definition

Alexander Subbasis Theorem

Definition

Heine-Borel Theorem

Definition

Product Topology

Definition

Facts

Tychonoff's Theorem

Definition

Tychonoff's Theorem

Definition

Connected Space

Definition

Examples

Facts

Topological Manifold

Definition

Facts

Second-Countable Space

Definition

Facts

Hausdorff Space

Definition

Facts

Whitney Embedding Theorem

Definition

Riemannian Manifold

Definition

Atlas

Definition

Atlas

Chart

Definition

Differentiable Manifold

Definition

Smooth Manifold

Definition

Facts

Diffeomorphism

Definition

Tangent Space

Definition

Integration of a Function over a Manifold

Definition

Isometry

Definition

Geodesic

Definition

Laplace-Beltrami Operator

Definition

Calculations