Definition

Example of a simple MDP with three states (green circles) and two actions (orange circles), with two rewards (orange arrows)

Markov decision process (MDP) is a tuple . Where:

  • : A set of states (State space)
  • : A set of actions (Action space)
  • : A Transition Probability from to given .
  • : The immediate Reward received after transitioning from state to due to action .

Facts

Environmental model in MDP is the Transition Probability. If the Transition Probability is known, it is called a model-based MDP, otherwise called a model-free MDP.

Any Markov decision process satisfies the followings: