Definition

The relative probability of the n-step trajectory under the target and behavior policies is

Algorithms

Off-policy n-Step TD

where is the importance weight.

Off-policy n-Step Sarsa

where is the importance weight, and .