Definition

Prioritized replay is an enhancement to the standard experience replay. Instead of uniformly sampling experiences from the replay buffer, prioritized replay samples important transitions more frequently based on their priority values. It makes it possible to learn more efficiently as the agent focuses on important experiences more.

The priority of transition is calculated by TD Error. where is the TD Error, and .

The probability of sampling transition is where controls how much prioritization is used (uniform sampling if ).

However, it can lead to a loss of diversity and introduce bias, but these issues can be alleviated and corrected with stochastic sampling prioritization (using ) and Importance Sampling weights.

The Importance Sampling weights are calculated as where

  • is the size of the replay buffer
  • is annealing parameter that fully compensates the bias when . It starts from and ends with .

The wight is normalized by before used for stability.

Algorithm

Double DQN with Prioritized Replay

  1. Initialize behavior network and target network with random weights , and the replay buffer to max size .
  2. Repeat for each episode:
    1. Initialize sequence .
    2. Repeat for each step of an episode until terminal, :
      1. With probability , select a random action otherwise select .
      2. Take the action and observe a reward and a next state .
      3. Store transition in the replay buffer with maximal priority .
      4. For every (replay period) steps:
        1. Sample random minibatch of transitions from .
        2. Compute the importance weight
        3. Compute TD Error
        4. Update transition priority
        5. Accumulate weight-change
      5. Update behavior network weights and reset
      6. Update the target network parameter every steps.
      7. Update .