Definition
The derivative of the expected total reward is the expectation of the product of total rewards and summed gradients of log of the policy .
Proof
The derivative of the expected total reward is the expectation of the product of total rewards and summed gradients of log of the policy .