Definition

The natural gradient is an optimization method that takes into account the geometry of the parameter space when updating parameters. Unlike the standard gradient, which points in the direction of the steepest ascent in the Euclidean Space of parameters, the natural gradient points in the direction of the steepest ascent in the space of probability Distribution induced by the parameters.

The natural gradient uses the Fisher Information matrix to define a Metric on the parameter space (Fisher Information Metric). It can be interpreted as the steepest ascent direction in the space of probability distributions, as measured by the KL-Divergence. Instead of taking a step that maximizes the change in the objective function in the Euclidean space of parameters, the natural gradient takes a step that maximizes the change in while keeping the change in the probability distribution, as measured by KL divergence, relatively small.