Chapter 20: The Limits of Tabular Methods

Learning objectives

Estimate memory for a tabular Q-table (states × actions × bytes per entry).
Relate the scale of real problems (e.g. Backgammon, continuous state) to the infeasibility of tables.
Argue why function approximation (linear, neural) is necessary for large or continuous spaces.

Concept and real-world RL

Tabular methods store one value per state (or state-action). When the state space is huge or continuous, this is impossible: Backgammon has on the order of \(10^{20}\) states; a robot with 10 continuous state variables discretized to 100 bins each has \(100^{10}\) cells.

Illustration (Q-table size growth): The number of Q-table entries grows with states × actions. A 10×10 grid has 400 entries; a 100×100 grid has 40,000; Backgammon-scale problems are astronomically larger. The chart below shows Q-table entries (log scale conceptually) for small vs large problems.

Function approximation uses a parameterized function \(Q(s,a; \theta)\) with a fixed number of parameters, so we can represent values for infinitely many states. This is the bridge to deep RL (Volumes 3–5): neural networks are function approximators that generalize from visited states to unseen ones.

Exercise: Estimate the memory required to store a Q-table for the game of Backgammon (approx. \(10^{20}\) states) with 4-byte floats. Compare this with the memory of a modern computer. Discuss the necessity of function approximation.

Professor’s hints

Backgammon: assume \(\approx 10^{20}\) states and a small number of actions (e.g. on the order of 10–100). Q-table size = states × actions × 4 bytes. So roughly \(10^{20} \times 50 \times 4\) bytes ≈ \(2 \times 10^{22}\) bytes. Convert to TB: \(2 \times 10^{22} / 10^{12}\) ≈ \(10^{10}\) TB. A typical computer has 8–64 GB RAM (\(10^9\)–\(10^{10}\) bytes); so the table is astronomically larger.
Compare: 1 TB = \(10^{12}\) bytes. Your estimate in TB shows that no physical machine can store the table. Hence we need a compact representation (function approximation).
Discussion: function approximation uses a fixed number of parameters (e.g. millions for a neural net) to represent \(Q\) for all states. We learn \(\theta\) from experience so that \(Q(s,a;\theta)\) generalizes to unseen \(s\). Trade-off: we can handle huge spaces but may have approximation error and instability.

Common pitfalls

Underestimating state count: Backgammon has many board configurations and dice outcomes; \(10^{20}\) is a common cited order of magnitude. Do not use a tiny number (e.g. 1000) for “Backgammon.”
Forgetting actions: Q-table is states × actions. Backgammon has a variable number of legal moves; use an upper bound (e.g. a few hundred) for a rough estimate.
Claiming FA has no downsides: Function approximation can diverge, generalize poorly, or forget; it is necessary for scale but introduces new challenges (covered in Volumes 3–5).

Worked solution (warm-up: Q-table size for 10×10 grid)

Warm-up: A 10×10 grid has 100 states and 4 actions. How many floats for a Q-table with 4 bytes each? Step 1: Entries = states × actions = 100 × 4 = 400 floats. Step 2: Bytes = 400 × 4 = 1600 bytes. So a small grid is trivial; Backgammon’s \(\approx 10^{20}\) states make a tabular Q-table infeasible, which is why we need function approximation (fixed number of parameters) for large or continuous spaces.

Extra practice

Warm-up: A 10×10 grid has 100 states and 4 actions. How many floats for a Q-table with 4 bytes each? (100 × 4 × 4 = 1600 bytes.)
Coding: Implement a linear function approximator for Q: state s (e.g. 4-dim) → feature vector φ(s) (e.g. identity or tile coding), Q(s,a) = w^T φ(s,a). Train with Q-learning (semi-gradient) on CartPole or a small env. Plot episode return.
Challenge: Estimate the number of parameters in a small neural network that takes a Backgammon state (encoded as a vector of size 100) and outputs Q-values for 50 actions (one hidden layer of 256 units). Compare that number to the tabular size. (E.g. 100×256 + 256×50 ≈ 38k parameters; 38k × 4 bytes ≈ 150 KB.)
Variant: Consider a Connect-4 board: 6 rows × 7 columns × 3 values (empty, X, O) per cell. Estimate the upper bound on state count (\(3^{42}\)) and compare the Q-table memory to the neural network approach.
Debug: The code below treats every unique continuous state as a separate dict key, so it never generalizes and the Q-table grows unboundedly. Explain the problem and how function approximation solves it.

Try it — edit and run (Shift+Enter)

Conceptual: What is the “curse of dimensionality” in tabular RL? Give one concrete example.
Recall: State in one sentence why function approximation is necessary for large or continuous state spaces.