Definition

Value iteration iteratively updates state-value function until convergence. Its time complexity is where and are the numbers of states and actions respectively.

Algorithm

  1. Initialize
  2. Update iteratively from all (full backup) until convergence to .
    • Synchronoius backups: compute for all and update simultaneously.
    • Asynchronoius backups: compute for one and update it immediately.
  3. Compute the optimal policy (one-step lookahead) and return it.