Definition

Actor-Critic method consists of two networks: actor and critic networks.

Actor network updates parameter for policy by maximizing using Policy Gradient

Critic network updates parameter for value function by minimizing

Examples

REINFORCE with Baseline

Actor-Critic Method with TD(0) Return

Asynchronous Advantage Actor-Critic Method

Deterministic Policy Gradient

Deep Deterministic Policy Gradient