Chapter 53: Planning with Known Models
Learning objectives Implement a planner using breadth-first search (BFS) for a gridworld with known deterministic dynamics. Recover the optimal policy (path to goal) and compare with dynamic programming (value iteration) in terms of computation and result. Relate BFS to shortest-path planning in robot navigation. Concept and real-world RL When the model is known and deterministic, we can plan without learning: BFS finds the shortest path from start to goal; value iteration computes optimal values for all states. In robot navigation (grid or graph), BFS is used for pathfinding; DP is used when we need values everywhere (e.g. for reward shaping). Both assume the model is correct; in RL we often learn the model or the value function from data. ...