Definition

REINFORCE with Baseline algorithm is a variant of the REINFORCE algorithm that helps reduce variance in Policy Gradient methods by adopting Actor-Critic Method. It modifies the objective function of the REINFORCE algorithm by subtracting a baseline from the returns, which helps reduce the variance of the gradient without introducing bias. where is a baseline function not related to (commonly chosen as a State-Value Function ).

Actor (Policy Gradient Update)

Critic (Value Network Update)

Algorithm

  1. Initialize state-value and policy networks randomly.
  2. Set the hyperparameters: step-sizes , and discount factor
  3. Repeat for each episode (Each starts from a state under the policy ):
    1. Repeat for each step of an episode until terminal, :
      1. (minimize )
      2. (maximize with the Policy Gradient )