Definition
Value iteration iteratively updates state-value function until convergence. Its time complexity is where and are the numbers of states and actions respectively.
Algorithm
- Initialize
- Update iteratively from all (full backup) until convergence to .
- Synchronoius backups: compute for all and update simultaneously.
- Asynchronoius backups: compute for one and update it immediately.
- Compute the optimal policy (one-step lookahead) and return it.