Value Functions and Bellman Equation
This page covers value functions and the Bellman equation you need for the preliminary assessment: state-value \(V^\pi(s)\), action-value \(Q^\pi(s,a)\), and the Bellman expectation equation for \(V^\pi\). Back to Preliminary. Why this matters for RL Value functions are the expected return from a state (or state-action pair) under a policy. They are the main object we estimate in value-based methods (e.g. TD, Q-learning) and appear in actor-critic as the critic. The Bellman equation is the recursive identity that connects the value at one state to immediate reward and values at successor states; it is the basis of dynamic programming and TD learning. ...