Definition

Optimizers are algorithms used to adjust the parameters of a model to minimize the loss function. The optimizers aim to improve the convergence speed and stability of the training process compared to standard Stochastic Gradient Descent.

Examples

Momentum Optimizer
Definition

Momentum optimizer remembers the update at each iteration, and determines the next update as a linear combination of the gradient and the previous update

$v_{t + 1} x_{t + 1} = ρ v_{t} - α \nabla f (x_{t}) = x_{t} + v_{t + 1}$ where $0 \leq ρ \leq 1$ is the momentum coefficient.
Link to original

AdaGrad Optimizer
Definition

Adaptive gradient descent (AdaGrad) is a Gradient Descent with parameter-wise learning rate.

$g_{t} = i = 1 \sum t (\nabla f (x_{i}))^{2}$ $x_{t + 1} = x_{t} - α \frac{\nabla f ( x _{t} )}{g _{t} + ϵ}$ where $g_{t}$ is the sum of squares of past gradients, and $ϵ$ is a small constant to prevent division by zero.
Link to original

RMSProp Optimizer
Definition

RMSProp optimizer resolves AdaGrad Optimizer’s rapidly diminishing learning rates and relative magnitude difference by taking the exponential moving average on history.

$g_{t} = γ g_{t - 1} + (1 - γ) (\nabla f (x_{t}))^{2}$ $x_{t + 1} = x_{t} - α \frac{\nabla f ( x _{t} )}{g _{t} + ϵ}$ where $γ$ is the decay rate.
Link to original

Adam Optimizer
Definition

Adaptive momentum estimation (Adam) combines the ideas of momentum and RMSProp optimizers.

$m_{t} v_{t} \overset{m}{^}_{t} \overset{v}{^}_{t} x_{t + 1} = β_{1} m_{t - 1} + (1 - β_{1}) \nabla f (x_{t}) = β_{2} v_{t - 1} + (1 - β_{2}) (\nabla f (x_{t}))^{2} = \frac{m _{t}}{1 - β _{1}^{t}} = \frac{v _{t}}{1 - β _{2}^{t}} = x_{t} - α \frac{m ^ _{t}}{v ^ _{t} + ϵ}$ Where:

$m_{t}$ is the estimate of the first moment (mean) of the gradients

$v_{t}$ is the estimate of the second moment (un-centered variance) of the gradients

$β_{1}$ and $β_{2}$ are decay rates for the moment estimates

$\overset{m}{^}_{t}$ and $\overset{v}{^}_{t}$ are bias-corrected estimates

Link to original

My Knowledge Base

Explorer

Optimizers

Definition

Examples

Momentum Optimizer

Definition

AdaGrad Optimizer

Definition

RMSProp Optimizer

Definition

Adam Optimizer

Definition

Graph View

Table of Contents

Backlinks