Gridworld

Gridworld discounted return from a sequence of actions.

The classic gridworld environment: states, actions, transitions, and terminal states.

Iterative policy evaluation on 4×4 gridworld.

Value iteration on 4×4 gridworld, optimal V and policy.

Code walkthrough for gridworld, iterative policy evaluation, and policy iteration.

Dyna-Q on 4×4 deterministic gridworld.

BFS planner for gridworld; compare with DP.

State visitation count bonus; exploration in gridworld.