Chapter 10: Limitations of Dynamic Programming
Learning objectives Compute the number of states and transition probabilities for a small finite MDP. Explain why tabular methods (storing a value per state or state-action) do not scale. Describe how function approximation (e.g. linear or neural) generalizes across states. Concept and real-world RL Dynamic programming (policy iteration, value iteration) assumes we can store a value for every state (or state-action) and iterate over all of them. In a 10×10 grid that is 100 states—manageable. In real problems (continuous state spaces, or discrete but huge spaces like board games or high-dimensional sensors), the number of states is enormous or infinite, so we cannot store a table. ...