Definition

The Quantile-Regression DQN (QR-DQN) algorithm is a Distributional Reinforcement Learning algorithm that approximates the distribution of random return using quantile regression.

Architecture

QR-DQN estimates a set of quantiles of return distribution, where represents the midpoint of -th quantile interval. This can be seen as adjusting the location of the supports of a uniform probability mass to approximate the desired quantile distribution.

The quantile distribution with uniform probability is constructed as where is a Dirac’s delta function at , and are the outputs of the network, representing the estimated quantile values. These values are obtained by applying the inverse CDF of the return distribution to the quantile midpoints .

Using the estimated quantile values as a support minimizes the Wasserstein Distance between the true return distribution and the estimated distribution.

Quantile Regression

Given data set , a -quantile minimizes the loss , where is a quantile loss function.

The quantile values are estimated by minimizing the quantile Huber loss function. Given a transition , the loss is defined as Where:

  • and .
  • where is Huber Loss.

Algorithm

  1. Initialize behavior network and target network with random weights , and the sample size .
  2. Repeat for each episode:
    1. Initialize sequence .
    2. Repeat for each step of an episode until terminal, :
      1. With probability , select a random action otherwise select .
      2. Take the action and observe a reward and a next state .
      3. Store transition in the replay buffer .
      4. Sample a random transition from .
      5. Select a greedy action
      6. Compute the target quantile values .
      7. Perform Gradient Descent on the loss .
      8. Update the target network parameter every steps.
      9. Update .