Gridworld
Learning objectives Define a gridworld MDP: grid cells as states, actions (up/down/left/right), transitions, and terminal states. Understand how hitting the boundary keeps the agent in place (or wraps, depending on design). Use gridworld as the running example for policy evaluation and policy iteration. What is Gridworld? Gridworld is a simple MDP used throughout RL teaching and research. The environment is a grid of cells (e.g. 4×4 or 5×5). The state is the agent’s position \((i, j)\). Actions are typically up, down, left, right. Transitions: taking an action moves the agent one cell in that direction; if the move would go off the grid, the agent either stays in place (and usually receives the same step reward) or the world wraps, depending on the specification. ...