Definition

Dueling DQN has two streams: advantage $A (s, a)$ and action-independent state-value $V (s)$ , sharing a feature encoder (CNN), and combined by an aggregator to produce action-value $Q (s, a) = V (s) + A (s, a)$ .

Aggregating module is unidentifiable in the sense that given a Q-value function, there are multiple possible decompositions into value and advantage functions ( $∵ Q (s, a) = (V (s) + c) + (A (s, a) - c)$ where $c$ is a constant value). This makes learning process unstable and less efficient. To force a unique decomposition, we introduce a constraint that makes the Advantage Function have zero-mean. $Q (s, a) = V (s) + (A (s, a) - \frac{1}{∣ A ∣} a^{'} \sum A (s, a^{'}))$

My Knowledge Base

Explorer

Dueling DQN

Definition

Graph View

Backlinks