Definition

REINFORCE algorithm is a policy gradient algorithm that maximizes the expected return. The objective function of REINFORCE algorithm based on the Policy Gradient Theorem. It substitutes the expectation and the total reward of Policy Gradient with averaging and returns .

Algorithm

  1. Execute trajectories (Each starts from a state under the policy ).
  2. Approximate the gradient of the objective function
  3. Update policy to maximize where is a learning rate.