Definition

Adaptive momentum estimation (Adam) combines the ideas of momentum and RMSProp optimizers.

Where:

  • is the estimate of the first moment (mean) of the gradients
  • is the estimate of the second moment (un-centered variance) of the gradients
  • and are decay rates for the moment estimates
  • and are bias-corrected estimates