Manifold Learning

Topological Space

Definition

Topological Space

A topological space is a set along with an additional structure called a Topology .

Link to original

Open Set

Definition

Consider a Topological Space . The elements of the Topology are called the open sets of

Definition in Metric Topology

Consider a Metric Space

Facts

Every point of is an interior point of .

Consider a Topological Space and a subset . Then, is open in if and only if for an arbitrary point in , there exists an open subset containing .

is an open set if and only if it’s equal to its Interior .

Link to original

Closed Set

Definition

Consider a Topological Space . A set is closed if

The complement of an Open Set is called a closed set.

Facts

  • and are closed

Consider a Subspace Topology of a Topological Space and a subset . Then,

Consider a Subspace Topology of a Topological Space and a subset . Then,

Consider a Topological Space and a subset . Then, where is the set of all limit points of

Consider a Topological Space and a subset . If the subset is equal to its Closure, it is closed.

Link to original

Neighborhood

Definition

Consider a subset of a Topological Space . A neighborhood of a point is a subset of including an open set containing .

Link to original

Embedding

Definition

Embedding is a function between two topological spaces has the following properties:

Embedding preserves the structure of a topological structure.

Link to original

Homeomorphism

Definition

A function between two topological spaces is a homeomorphism if it satisfies the following conditions:

  • is Bijective
  • and its inverse function are continuous ()
    • Equivalent:

Homeomorphic

Consider the two topological spaces . If there exists a homeomorphism between and , then is homeomorphic to

Link to original

Compactness

Definition

Consider a Topological Space . The space is compact if every open covering (a collection of open subsets of whose union is ) of has a finite subcovering.

Examples

Consider a Standard Topology on real numbers . Then, the subspace Topologies

  • is not compact

  • is compact

  • is not compact

  • is not compact

  • is compact (closed and bounded)

Facts

Consider a Subspace Topology of a Topological Space . Then, is compact if and only if every covering of by open sets in has a finite subcollection covering .

A closed subspace of a compact space is compact.

A compact subspace of a Hausdorff Space is closed.

The image of a compact space under a continuous map is compact.

Consider a Bijective continuous function between two topological spaces. If is a compact space and is a Hausdorff Space, then is a Homeomorphism

Tychonoff's Theorem

Definition

The Product Topology of any collection of compact topological spaces is compact.

Link to original

Consider a Topological Space . is compact if and only if for every collection of closed sets in that have the Finite Intersection Property, its entire intersection is a non-empty set .

Every closed interval in Real Number is compact.

Heine-Borel Theorem

Definition

Consider a Standard Topology on a Euclidean Space and a subset Then, is closed and bounded if and only if is compact

Link to original

Consider a Topological Space . If is compact then is Limit Point Compact

Consider a metrizable space . Then, is compact if and only if is Limit Point Compact if and only if is Sequentially Compact

Every Compact metrizable space has a countable basis.

Link to original

A Compact Hausdorff Space is a normal space.

Link to original

Consider a Hausdorff Space and disjoint Compact subsets . Then, there exists disjoint open sets satisfying and .

Every compact space is Countably Compact.

Every Compact space has Bolzano–Weierstrass property.

Link to original

Every Compact space is locally compact.

Link to original

A Topological Space is Compact if has a basis such that every open covering of by elements of has a finite subcover.

Alexander Subbasis Theorem

Definition

A Topological Space is Compact if and only if has a Subbasis such that every open covering of by elements of has a finite subcover.

Link to original

Link to original

Heine-Borel Theorem

Definition

Consider a Standard Topology on a Euclidean Space and a subset Then, is closed and bounded if and only if is compact

Link to original

Product Topology

Definition

A product topology is the Cartesian product of a family of topological spaces

Let be an index set, and be a Topological Space. The product topology is defined as where is the -th coordinate (component) of the function .

Facts

Consider bases for the topologies on . Then, is a basis for the topology on .

The countably infinite product space of real numbers with the Product Topology is metrizable.

Consider a function between a Topological Space and a product space. Then, the function is continuous if and only if each Composite Function of the function and the Projection Map is continuous.

The Product Space of any collection of Hausdorff spaces is a Hausdorff space.

Link to original

The Product Space of any collection of connected spaces is a connected space.

Link to original

The Product Space of two separable spaces is a separable space.

Link to original

The Product Space of two first-countable spaces is a first-countable space.

Link to original

The Product Space of two second-countable spaces is a second-countable space.

Link to original

Tychonoff's Theorem

Definition

The Product Topology of any collection of compact topological spaces is compact.

Link to original

Link to original

Tychonoff's Theorem

Definition

The Product Topology of any collection of compact topological spaces is compact.

Link to original

Connected Space

Definition

A Topological Space is connected if it can not be represented as the union of two disjoint, non-empty, open or closed subsets. where is a disjoint union.

Or, equivalently, if are the only open and closed subsets.

Examples

Topologist’s sine curve

Consider a Subspace Topology , where , of a Standard Topology on real plane. The Topological Space is connected, but not path connected.

Consider a set .

Consider a Subspace Topology of a Standard Topology on real number. The Topological Space is not connected by a separation

Consider a Subspace Topology , where , of a Standard Topology on real plane. The Topological Space is not connected by a separation .

Facts

Every path connected space is a Connected Space.

Link to original

Consider a Subspace Topology of a Topological Space . Then, is a separation of if and only if . where is a disjoint union, is the set of all limit points of .

Consider a Subspace Topology of a Topological Space and a separation of . If is a connected subspace, then .

Consider a collection of connected subspaces of a Topological Space . If the subspaces have a point in common , then the union of the collection is connected.

Consider a collection of connected subspaces of a Topological Space , and a connected subspace . If , then is connected.

Consider a connected Subspace Topology of a Topological Space . For some Subspace Topology , if , then is connected.

Consider two topological spaces . If is connected and there exists a Continuous Function , then is a connected space.

Consider a continuous function between two topological spaces. If is connected, then is a connected subspace in .

Consider a Topological Space and a subspace . If is connected, then its Closure is also connected.

The Product Space of any collection of connected spaces is a connected space.

Link to original

Topological Manifold

Definition

A Locally Euclidean second-countable Hausdorff Space Topological Space is a topological manifold.

A one-dimensional manifold is called a curve, and a two-dimensional manifold is called a surface.

Facts

The Product Space of an -dimensional manifold and an -dimensional manifold is an -dimensional manifold.

Consider an -dimensional manifold without boundary and an -dimensional manifold with boundary . Then,

Link to original

Second-Countable Space

Definition

A second-countable space is a Topological Space whose Topology has a countable basis.

Facts

Every second-countable space is a Separable Space.

-dimension Euclidean Space where , is a second-countable space.

Hilbert Space is a second-countable space.

The Product Space of two second-countable spaces is a second-countable space.

Every Second-Countable Space is a Lindelof space.

Link to original

Link to original

Hausdorff Space

Definition

A Topological Space is Hausdorff space if where and are the neighbourhoods of and respectively.

Facts

Every finite point set in Hausdorff space is closed ( T1 Axiom).

Consider a Topological Space satisfying axiom. Then, where

Consider a Topological Space . Then, a Sequence of points of converges to at most one point of .

Every totally ordered set is a Hausdorff space in the Order Topology.

A Subspace Topology of a Hausdorff space is a Hausdorff space (Hereditary Property).

The Product Space of any collection of Hausdorff spaces is a Hausdorff space.

A Compact Hausdorff Space is a normal space.

Link to original

Link to original

Whitney Embedding Theorem

Definition

Any -dimensional smooth manifold can be embedded in

Link to original

Riemannian Manifold

Definition

A Riemannian manifold is a smooth Topological Manifold together with a Riemannian metric. Many geometric notions such as distance, angles, length, volume, and curvature are defined on a Riemannian manifold.

Let be a Smooth Manifold. For each point , there is an associated vector space called Tangent Space of at . Each Tangent Space equipped with an Inner product defined by the basis of the tangent space. Where is a standard inner product of the ambient space (Euclidean inner product).

The collection of inner products is a Riemannian metric on , and the pair of the manifold and the Riemannian metric defines a Riemannian manifold.

Link to original

Atlas

Definition

Atlas

An atlas on a Topological Manifold is a collection of charts which covers .

Link to original

Chart

Definition

Let be a Topological Manifold. A chart for the manifold is a Homeomorphism , where is an open subset of the manifold, and is an open subset of the Euclidean Space.

Link to original

Differentiable Manifold

Definition

Let be a Topological Manifold. It is -times differentiable if the following condition holds: where is Differentiability Class.

Link to original

Smooth Manifold

Definition

A smooth manifold is an infinitely Differentiable Manifold ().

Facts

The Product Space of an -dimensional smooth manifold and an -dimensional smooth manifold is an -dimensional smooth manifold.

Consider an -dimensional smooth manifold without boundary and an -dimensional smooth manifold with boundary . Then,

Link to original

Diffeomorphism

Definition

Given two differentiable manifolds and , a differentiable map is a diffeomorphism if it is a Bijective and its inverse is differentiable as well.

If these functions are times continuously differentiable, is called -diffeomorphism

If there is a diffeomorphism between two manifolds and , we call these two manifolds are diffeomorphic and denote as

Link to original

Tangent Space

Definition

Suppose that is a Differentiable Manifold and that . Pick a coordinate Chart , where is an open subset of containing . The tangent space is the set of all tangent vectors at on the manifold .

The basis of the tangent plane is given by the partial derivatives

Link to original

Integration of a Function over a Manifold

Definition

Consider a scalar function and a Riemannian Manifold with a local Chart . The integral of over manifold can be calculated in a local coordinate space. where , , and is the absolute value of the Determinant of the inner product at

Link to original

Isometry

Definition

Let two Riemannian manifolds and , and a Diffeomorphism . Then is called an isometry if the following condition holds:

Link to original

Geodesic

Definition

A geodesic is a curve representing the shortest path between two points in a Riemannian Manifold.

Suppose a Riemannian Manifold . A differentiable curve in is defined as a smooth mapping from an open interval into . The distance of a smooth curve is given by Arc Length The distance between two points and on is defined as the Infimum of the length taken over all differentiable curves such that and . where is the set of all differentiable curves in that join up the points and .

Link to original

Laplace-Beltrami Operator

Definition

The Laplace–Beltrami operator is the Divergence of the Gradient. where , and is the absolute value of the Determinant of the inner product at

Calculations

Calculation of Gradient

Consider an -dimensional Riemannian Manifold with a local Chart , and a function .

By the definition of the Gradient, where is the inner product at , and and are the -th and -th element of the vector and gradient vector respectively.

By Chain Rule, The directional derivative of at a point in the direction of is given by where

By combining these two equations, we can obtain the following equality. Let be the inverse matrix of , then The gradient vector is where

Calculation of Divergence

Consider an -dimensional Riemannian Manifold with a local Chart , and a function .

By the Divergence Theorem, for a function with a compact support and a vector field , In a Euclidean Space, the integration of both sides are The integration can be performed over a coordinate Chart of the manifold.^[Integration of a Function over a Manifold] By the Integration by Parts,

\int_{\mathbb{R}^{n}} \tilde{f} \cdot \operatorname{div}V \sqrt{|g|} dx &= - \int_{\mathbb{R}^{n}} g(\nabla f, V) \sqrt{|g|} dx\\ &= - \int_{\mathbb{R}^{n}} \sum\limits_{i=1}^{n} V_{i} \frac{\partial\tilde{f}}{\partial x_{i}} \sqrt{|g|} dx\\ &= - \int_{\mathbb{R}^{n}} \tilde{f} \sum\limits_{i=1}^{n} \frac{\partial}{\partial x_{i}} (V_{i} \sqrt{|g|}) dx \end{aligned}$$ where $\tilde{f} := f \circ \varphi^{-1}$, $U = \varphi(\mathcal{M})$, $|g|$ is the absolute value of the [[Determinant]] of the inner product $g$ at $\varphi^{-1}(x)$, and $V_{i}$ is the $i$-th element of the vector $V$. Therefore, we have $$\operatorname{div}V = \frac{1}{\sqrt{|g|}} \sum\limits_{i=1}^{n} \frac{\partial}{\partial x_{i}} (V_{i} \sqrt{|g|})$$Link to original

Linear Manifold Learning

Principal Component Analysis

Definition

PCA is a linear dimensionality reduction technique. The correlated variables are linearly transformed onto a new coordinate system such that the directions capturing the largest variance in the data.

Population Version

Given a random vector , we find a such that is maximized: Equivalently, by the Method of Lagrange Multipliers with , By differentiation, the is given by the eigen value problem Thus the maximizing the variance of is the eigenvector corresponding to the largest Eigenvalue.

Sample Version

Given a data matrix , by Singular Value Decomposition, A matrix can be factorized as . By algebra, , where we call the -th principal component.

Facts

Since and

Link to original

Robust Principal Component Analysis

Definition

PCA doesn’t work well when there are noises in the input data. The goal of Robust PCA is to find a matrix decomposition to a low rank matrix and a sparse matrix.

The optimization problem of Robust PCA is set up as where is a low-rank matrix, is a sparse matrix, and is the number of non-zero elements in the matrix.

However, the optimization problem is infeasible because it is neither continuous nor convex. So, we solve the relaxed problem where is the Nuclear Norm of the matrix, and is the matrix L1 norm.

Applications

  • Video Surveillance
  • Face Recognition
  • Latent Semantic Indexing
  • Collaborative Filtering (Matrix Completion)
Link to original

Nonlinear Manifold Learning

Kernel Principal Component Analysis

Definition

Consider an demeaned matrix By SVD , where are orthonormal matrices and is a Diagonal Matrix. Then, By PCA . For a linear kernel , . is an Orthonormal Matrix, The kernel principal components are given by solution of the optimization problem. where

Link to original

Self-Organizing Map

Definition

A self-organizing map (SOM) is a clustering method that produces a low-dimensional representation of a higher-dimensional data set while preserving the Topological Manifold structure of the data.

The algorithm fits a grid to high-dimensional data and assigns the data to the fitted nodes (prototypes) of the grid.

Link to original

Isomap

Definition

Isomapis a non-linear dimensional reduction method that aims to preserve the intrinsic geometry of high-dimensional data in a lower-dimensional space. The main idea of isomap is to approximate the geodesic distances between data points on a manifold, rather than just using straight-line Euclidean distances.

Algorithm

  1. Determine the neighbors of each point
    • All points in some fixed radius
    • k-nearest neighbors
  2. Construct a neighborhood distnace graph
  3. Compute the shortest path between two nodes with Dijkstra’s algorithm.
Link to original

Locally Linear Embedding

Definition

Locally linear embedding (LLE) is a non-linear dimensional reduction method that aims to preserve the intrinsic geometry of high-dimensional data in a lower-dimensional space.

Algorithm

  1. Determine the k-nearest neighbors of each point

  2. Compute a set of weights for each point that best approximates the point as a linear combination of its neighbors. where is a data point vector, is matrix, and is the set of the neighbors of

  3. Find the low-dimensional embedding of points by solving Eigenvalue problem. Where each point is still described with the same linear combination of its neighbors. where is a vector has lower dimension than , and is the matrix obtained by the step 2.

    In a matrix notation, the problem is written as where

    By the Method of Lagrange Multipliers, the solution of the optimization problem is given as Therefore, consists of eigenvectors corresponding to the -smallest eigenvalues.

Link to original

Multidimensional Scaling

Definition

Multidimensional Scaling (MDS) is a non-linear dimensional reduction method. It aims to represent high-dimensional data in a lower-dimensional space while preserving the pairwise distances

Algorithm

MDS

  1. Find a low-dimensional that minimizes the difference in the pairwise distance of points. where is a data point vector, is a distance function.

Local MDS

  1. Build a set of nearby pairs where is the set of the k-nearest neighbors of
  2. Find a low-dimensional that minimizes the difference in the local pairwise distance of points. where is a data point vector, is a distance function, is a matrix with very large value, and is a small value weight often .
Link to original

Adjacency Matrix

Definition

Consider a Graph with nodes .

Unweighted Adjacency Matrix

An unweighted adjacency matrix is matrix such that its element is one when there is edge from node to and zero when there is no edge.

Weighted Adjacency Matrix

An unweighted adjacency matrix is matrix such that its element is the weight of the edge between and , and is zero when there is no edge.

Examples

Given graph

Adjacency matrix

Facts

If a graph is undirected, then the adjacency matrix is Symmetric Matrix

Link to original

Degree Matrix

Definition

The degree matrix of an undirected Graph is a Diagonal Matrix representing the sum of the weight of edges connected to each node. It is used together with the Adjacency Matrix to construct the Laplacian Matrix of a graph.

Given a graph and the corresponding Adjacency Matrix . The degree matrix is defined as where ,

Examples

Given graph

Degree matrix

Link to original

Laplacian Matrix

Definition

The Laplacian matrix is a matrix representation of a Graph. where is the Degree Matrix, and is the Adjacency Matrix of the graph.

Random Walk Normalized Laplacian Matrix

Symmetrically Normalized Laplacian Matrix

Examples

Laplacian Matrix for Simple Graph

Adjacency Matrix and Degree Matrix

Laplacian matrix

Laplacian Matrix for Graph with Weighted Edges

Adjacency Matrix and Degree Matrix

Laplacian matrix

Facts

The number of zero-eigenvalues of the Laplacian matrix equals the number of connected clusters in the graph.

Analogousness to the Laplace-Beltrami Operator

The Laplacian matrix on a graph, can be ragarded as a discrete approximation to the Laplace-Beltrami Operator on a manifold.

The quadratic form of the Laplacian Matrix can be seen as a discretization of squared gradient on manifold.

where is a distance between two points and defined by the Adjacency Matrix.

Link to original

Laplacian Eigenmap

Definition

The Laplacian eigenmap is an embedding preserving local information optimally.

Algorithm

Consider a set of data points . Make an Adjacency Matrix with Gaussian Radial Basis Function Kernel where is a scale parameter.

Construct a Laplacian Matrix using the Adjacency Matrix. The following options are available.

Construct the matrix whose columns are eigenvectors corresponding to the -smallest eigenvalues of the Laplacian Matrix. It is the result of Laplacian eigenmap where , and is intentionally omitted because it corresponds to the trivial solution (constant vector).

Link to original

Spectral Clustering

Definition

Spectral clustering technique make use of spectrum of the similarity matrix of the data to perform dimensionality reduction before clustering.

Algorithms

Apply Laplacian Eigenmap to the given data .

Form an sub matrix using the first -columns of the embedded matrix , and normalize each row to norm .

Apply clustering algorithm (e.g. K-Means Clustering) to , where is the -th row of .

Mathematical Background

Consider a cluster assignment vector for each data point. Then, clustering observations is corresponding to estimating ‘s The entries of the Adjacency Matrix represents the similarity between points and .

Under the assumption that the close data points (large ) have a similar label (), the optimization problem is set up that minimizing the difference between assignments for similar points. where the constraint is imposed to avoid the trivial solution .

In a matrix notation, the problem is

When , is large and when , is small. So, works as the weight for each pair in the optimization.

The Lagrangian function is defined as And the solution of the problem is given by an Eigenvalue problem. The solution is an matrix of eigenvectors sorted in ascending by their corresponding eigenvalues.

Link to original

t-Distributed Stochastic Neighbor Embedding

Definition

The t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualizing high-dimensional data. It models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points with high probability.

Algorithm

  1. Constructs a probability distribution over pairs of high-dimensional objects in such a way that similar objects are assigned a higher probability while dissimilar points are assigned a lower probability.

Define the conditional probability by Normal Distribution where acts as an adaptive bandwidth parameter for each point , and is indirectly determined by the given hyperparameter (perplexity).

And define the probability where is the number of data set

  1. Defines a similar probability distribution over the points in the low-dimensional map,

Define the similarities between two points in the low-dimensional map.

  1. Minimizes the Kullback-Leibler Divergence between the two distributions with respect to the locations of the points in the low-dimensional map.

Link to original

UMAP

Definition

Uniform manifold approximation and projection (UMAP) is a nonlinear dimensionality reduction algorithm similar to t-SNE. However, unlike to t-SNE, UMAP uses spectral embedding to initialize the low-dimensional graph, and uses a graph that only connected to each points’ nearest neighbors.

Algorithm

  1. For each point in high-dimensional space, find its nearest neighbors.

  2. Compute the distances between the nearest neighbors for each point.

  1. Calculate the local connectivity parameter for each point using the equation where is the number of nearest neighbors, and is the minimum distance among the nearest neighbors

  2. Calculate the edge weights to the nearest neighbors and symmetrize the graph

  1. Initialize the low-dimensional representation using spectral embedding.

  2. optimize the low-dimensional representation by minimizing both the attractive and the repulsive force using SGD.

The cost function is defined as where is the set of edges, and and are the weights of the edge of the high-dimensional and low-dimensional graph respectively.

Link to original