Advantage

Overall Progress 0%

Dueling architecture V(s) + A(s,a); compare with DQN.

Sketch two-network actor-critic; pseudocode for TD error updates.