Definition
Double DQN is a variation of DQN that uses the idea of Double Q-Learning to reduce overestimation. In this case, the target network in DQN works as the second network for double Q-learning without introducing an additional network.
Algorithm
- Initialize behavior network and target network with random weights , and the replay buffer to max size .
- Repeat for each episode:
- Initialize sequence .
- Repeat for each step of an episode until terminal, :
- With probability , select a random action otherwise select .
- Take the action and observe a reward and a next state .
- Store transition in the replay buffer .
- Sample random minibatch of transitions from .
- .
- Perform Gradient Descent on loss .
- Update the target network parameter every steps.
- Update .