Definition

The derivative of the expected total reward is the expectation of the product of total rewards and summed gradients of log of the policy .

Proof