Definition

Reward is a scalar feedback indicating how well the agent is doing at step .

There are three reward types:

  • : the reward in the state .
  • : the reward by the action in the state .
  • : the reward received after transitioning from state to due to action .

Expected Reward

Under known dynamics of all transitions , one can compute the followings:

  • State transition probability

  • Expected reward for (state, action) pair

  • Expected reward for (state, action, next state) triple

Facts

Reinforcement learning is based on the Reward Hypothesis.