Definition

Deterministic Policy Gradient (DPG) learns a deterministic policy $a = μ (s)$ as an actor on continuous action spaces and an Action-Value Function $Q (s, a)$ as a critic.

DPG requires fewer samples to approximate the gradient than stochastic Policy Gradient because DPG updates the parameter only over the state space, according to the Deterministic Policy Gradient Theorem. $Δ θ = \nabla_{a} Q (s, a) ∣_{a = μ (s; θ)} \nabla_{θ} μ (s; θ)$

My Knowledge Base

Explorer

Deterministic Policy Gradient

Definition

Graph View

Backlinks