Definition

Dueling DQN has two streams: advantage and action-independent state-value , sharing a feature encoder (CNN), and combined by an aggregator to produce action-value .
Aggregating module is unidentifiable in the sense that given a Q-value function, there are multiple possible decompositions into value and advantage functions ( where is a constant value). This makes learning process unstable and less efficient. To force a unique decomposition, we introduce a constraint that makes the Advantage Function have zero-mean.