Definition
Example of a simple MDP with three states (green circles) and two actions (orange circles), with two rewards (orange arrows)
Markov decision process (MDP) is a tuple . Where:
- : A set of states (State space)
- : A set of actions (Action space)
- : A Transition Probability from to given .
- : The immediate Reward received after transitioning from state to due to action .
Facts
Environmental model in MDP is the Transition Probability. If the Transition Probability is known, it is called a model-based MDP, otherwise called a model-free MDP.
Any Markov decision process satisfies the followings:
- There exists an optimal policy
- All optimal policies achieve the optimal state-value function
- All optimal policies achieve the optimal action-value function
Example of a simple MDP with three states (green circles) and two actions (orange circles), with two rewards (orange arrows)