Chapter 20: The Limits of Tabular Methods

Learning objectives Estimate memory for a tabular Q-table (states × actions × bytes per entry). Relate the scale of real problems (e.g. Backgammon, continuous state) to the infeasibility of tables. Argue why function approximation (linear, neural) is necessary for large or continuous spaces. Concept and real-world RL Tabular methods store one value per state (or state-action). When the state space is huge or continuous, this is impossible: Backgammon has on the order of \(10^{20}\) states; a robot with 10 continuous state variables discretized to 100 bins each has \(100^{10}\) cells. ...

March 10, 2026 · 4 min · 645 words · codefrydev

Tabular Methods

This page covers the tabular methods you need for the preliminary assessment: policy iteration and value iteration, the difference between Monte Carlo and TD, on-policy vs off-policy learning, and the Q-learning update rule. Back to Preliminary. Why this matters for RL When the state and action spaces are small enough, we can store one value per state (or state-action) and update them from experience or from the model. Dynamic programming does this when we know the model; Monte Carlo and TD do it from samples. Q-learning is the canonical off-policy TD method and is the basis of many deep RL algorithms (e.g. DQN). You need to know how these methods differ and how to write the Q-learning update. ...

March 10, 2026 · 6 min · 1277 words · codefrydev