Definition
REINFORCE with Baseline algorithm is a variant of the REINFORCE algorithm that helps reduce variance in Policy Gradient methods by adopting Actor-Critic Method. It modifies the objective function of the REINFORCE algorithm by subtracting a baseline from the returns, which helps reduce the variance of the gradient without introducing bias. where is a baseline function not related to (commonly chosen as a State-Value Function ).
Actor (Policy Gradient Update)
Critic (Value Network Update)
Algorithm
- Initialize state-value and policy networks randomly.
- Set the hyperparameters: step-sizes , and discount factor
- Repeat for each episode (Each starts from a state under the policy ):
- Repeat for each step of an episode until terminal, :
- (minimize )
- (maximize with the Policy Gradient )
- Repeat for each step of an episode until terminal, :