Windy Gridworld
Learning objectives Understand the Windy Gridworld environment: movement is shifted by a column-dependent wind. Implement the transition model and run iterative policy evaluation and policy iteration on it. Compare with the standard gridworld (no wind). Theory Windy Gridworld (Sutton & Barto) is a rectangular grid (e.g. 7×10) with: States: Cell positions \((row, col)\). Actions: Up, down, left, right (four actions). Wind: Each column has a fixed wind strength (non-negative integer). When the agent takes an action, the resulting row is shifted up by the wind strength (wind blows upward). So from cell \((r, c)\), after applying action “up” you might move to \((r - 1 + \text{wind}[c], c)\); “down” gives \((r + 1 + \text{wind}[c], c)\), etc. The agent cannot go below row 0 or above the grid; positions are clipped to the grid. Terminal state: One goal cell. Typical reward: -1 per step until the goal. So the same action can lead to different next states depending on the column (wind). The MDP is still finite and deterministic given state and action (wind is fixed per column). This makes the problem slightly harder than a plain gridworld and is a good testbed for policy evaluation and policy iteration. ...