Definition

Generalized policy iteration uses the repeatedly approximated value function to the true value of the current policy (sample backup) and the policy is repeatedly improved to approach the optimality.