Definition
Probabilistic PCA (PPCA) is a dimensionality reduction techniques that that aims to explain the covariance structure of given data using a fewer number of latent variables (features). PPCA is a probabilistic formulation of PCA. Instead of finding orthogonal directions that maximize variance, PPCA models the observed data as linear combinations of latent features plus some Gaussian noise.
Consider an data matrix whose rows are i.i.d. samples from an unknown distribution . PPCA assumes the samples are generated from a lower-dimensional latent features , where .
The PPCA model is defined as Where:
- is a loading matrix that maps the latent space to the observed space.
- is the -dimensional mean of the observed data.
- is the Gaussian noise term.
From the model, we can derive the properties of the observed data.
The goal of PPCA is to estimate the model parameters , , and from the observed data. This is done using MLE, and the analytic form of MLE exists.
When the noise variance is estimated using MLE, the principal subspace spanned by the columns of is the same as the principal subspace of PCA.