Chapter 10: Limitations of Dynamic Programming

Learning objectives Compute the number of states and transition probabilities for a small finite MDP. Explain why tabular methods (storing a value per state or state-action) do not scale. Describe how function approximation (e.g. linear or neural) generalizes across states. Concept and real-world RL Dynamic programming (policy iteration, value iteration) assumes we can store a value for every state (or state-action) and iterate over all of them. In a 10×10 grid that is 100 states—manageable. In real problems (continuous state spaces, or discrete but huge spaces like board games or high-dimensional sensors), the number of states is enormous or infinite, so we cannot store a table. ...

March 10, 2026 · 4 min · 683 words · codefrydev

Chapter 20: The Limits of Tabular Methods

Learning objectives Estimate memory for a tabular Q-table (states × actions × bytes per entry). Relate the scale of real problems (e.g. Backgammon, continuous state) to the infeasibility of tables. Argue why function approximation (linear, neural) is necessary for large or continuous spaces. Concept and real-world RL Tabular methods store one value per state (or state-action). When the state space is huge or continuous, this is impossible: Backgammon has on the order of \(10^{20}\) states; a robot with 10 continuous state variables discretized to 100 bins each has \(100^{10}\) cells. ...

March 10, 2026 · 4 min · 645 words · codefrydev

Chapter 21: Linear Function Approximation

Learning objectives Represent the action-value function as \(Q(s,a;w) = w^T \phi(s,a)\) with a feature vector \(\phi\). Use tile coding (overlapping grid tilings) to produce binary features for continuous state (e.g. MountainCar). Implement semi-gradient SARSA: update \(w\) using the TD target with current \(Q\) for the next state. Concept and real-world RL Linear function approximation approximates \(Q(s,a) \approx w^T \phi(s,a)\). The weights \(w\) are learned from data; \(\phi(s,a)\) is a fixed or hand-designed feature. Tile coding partitions the state space into overlapping tilings; each tiling is a grid, and the feature vector has a 1 for each tile that contains the state (and the action), so we get a sparse binary vector. This allows generalization across similar states. Semi-gradient methods use the TD target but treat the next-state value as a constant when taking the gradient (no backprop through the target). Linear FA is the simplest form of value approximation and appears in legacy RL and as a baseline. ...

March 10, 2026 · 3 min · 606 words · codefrydev

Feature Engineering for Reinforcement Learning

Learning objectives Choose or design feature vectors \(\phi(s)\) or \(\phi(s,a)\) for linear \(V(s) = w^T \phi(s)\) or \(Q(s,a) = w^T \phi(s,a)\). Use tile coding, polynomial features, and normalization appropriately. Understand how feature choice affects generalization and learning speed. Why features matter In linear function approximation, we approximate \(V(s) \approx w^T \phi(s)\) or \(Q(s,a) \approx w^T \phi(s,a)\). The feature vector \(\phi\) determines what the function can represent. Good features capture the right structure (e.g. similar states get similar values) and keep the dimension manageable so that learning is stable and sample-efficient. ...

March 10, 2026 · 2 min · 400 words · codefrydev

Function Approximation and Deep RL

This page covers function approximation and deep RL concepts you need for the preliminary assessment: why we need FA, the policy gradient update, exploration in DQN, experience replay, and the advantage of actor-critic. Back to Preliminary. Why this matters for RL In large or continuous state spaces we cannot store a value per state; we use a parameterized function (e.g. neural network) to approximate values or policies. That leads to policy gradient methods (maximize return) and value-based methods with FA (e.g. DQN). DQN uses experience replay and exploration (e.g. ε-greedy); actor-critic combines a policy (actor) and a value function (critic) for lower-variance policy gradients. You need to understand why FA is necessary and how these pieces fit together. ...

March 10, 2026 · 7 min · 1400 words · codefrydev