Value Function

State-value function V^π for random policy on Chapter 3 MDP.

Derive Bellman optimality equation for Q*(s,a).

V^π(s), Q^π(s,a), and the Bellman expectation equation — with worked examples and explanations.

Dueling architecture V(s) + A(s,a); compare with DQN.