Definition

Double DQN is a variation of DQN that uses the idea of Double Q-Learning to reduce overestimation. In this case, the target network in DQN works as the second network for double Q-learning without introducing an additional network.

Algorithm

  1. Initialize behavior network and target network with random weights , and the replay buffer to max size .
  2. Repeat for each episode:
    1. Initialize sequence .
    2. Repeat for each step of an episode until terminal, :
      1. With probability , select a random action otherwise select .
      2. Take the action and observe a reward and a next state .
      3. Store transition in the replay buffer .
      4. Sample random minibatch of transitions from .
      5. .
      6. Perform Gradient Descent on loss .
      7. Update the target network parameter every steps.
      8. Update .