Value Function
Overall Progress
0%
State-value function V^π for random policy on Chapter 3 MDP.
Derive Bellman optimality equation for Q*(s,a).
V^π(s), Q^π(s,a), and the Bellman expectation equation — with worked examples and explanations.
Dueling architecture V(s) + A(s,a); compare with DQN.