Chapter 5: Value Functions

Learning objectives Define the state-value function \(V^\pi(s)\) as the expected return from state \(s\) under policy \(\pi\). Write and solve the Bellman expectation equation for a small MDP. Use matrix form (linear system) when the MDP is finite. Concept and real-world RL The state-value function \(V^\pi(s)\) is the expected (discounted) return starting from state \(s\) and following policy \(\pi\). It answers: “How good is it to be in this state if I follow this policy?” In games, \(V(s)\) is like the expected outcome from a board position; in navigation, it is the expected cumulative reward from a location. The Bellman expectation equation expresses \(V^\pi\) in terms of immediate reward and the value of the next state; for finite MDPs it becomes a linear system \(V = r + \gamma P V\) that we can solve by matrix inversion or iteration. ...

March 10, 2026 · 3 min · 620 words · codefrydev