Gridworld

Learning objectives Define a gridworld MDP: grid cells as states, actions (up/down/left/right), transitions, and terminal states. Understand how hitting the boundary keeps the agent in place (or wraps, depending on design). Use gridworld as the running example for policy evaluation and policy iteration. What is Gridworld? Gridworld is a simple MDP used throughout RL teaching and research. The environment is a grid of cells (e.g. 4×4 or 5×5). The state is the agent’s position \((i, j)\). Actions are typically up, down, left, right. Transitions: taking an action moves the agent one cell in that direction; if the move would go off the grid, the agent either stays in place (and usually receives the same step reward) or the world wraps, depending on the specification. ...

March 10, 2026 · 2 min · 356 words · codefrydev

Anaconda Environment Setup

Learning objectives Create a dedicated conda environment for the curriculum. Install Python and key packages in that environment. Activate and use the environment for running exercises. Why use a conda environment? A conda environment isolates the curriculum’s Python and packages from your system or other projects. You can use a specific Python version and install NumPy, PyTorch, Gym, etc. without affecting other work. If something breaks, you can recreate the environment. ...

March 10, 2026 · 2 min · 237 words · codefrydev

RL Framework

This page covers the core RL framework you need for the preliminary assessment: the four main components, the Markov property, exploration vs exploitation, and the discount factor. Back to Preliminary. Why this matters for RL Every RL problem is defined by who acts (agent), what they interact with (environment), what they observe (state), what they can do (actions), and what feedback they get (reward). The Markov property and the discount factor shape how we define value functions and algorithms. Exploration vs exploitation is the central tension in learning from experience. ...

March 10, 2026 · 6 min · 1198 words · codefrydev

Setting Up Your Environment

Learning objectives Know what software you need (Python, libraries, optional IDE). Perform a pre-installation check so you are ready for the curriculum. Pre-Installation Check Before diving into the curriculum, ensure you have: Python: Version 3.8 or higher (3.9–3.11 recommended). Check with python3 --version or python --version. pip: So you can install packages. Check with pip --version or pip3 --version. Optional but recommended: A virtual environment (venv or conda) so curriculum dependencies do not conflict with other projects. See Anaconda Setup for conda. Libraries used in the curriculum: NumPy, Matplotlib, and (for later volumes) PyTorch or TensorFlow, and Gym/Gymnasium. See Installing Libraries for how to install them. What you need For Volumes 1–2 (foundations, tabular methods): Python, NumPy, Matplotlib. You can implement gridworld, bandits, Monte Carlo, and TD in plain Python + NumPy; plotting helps for learning curves. For Volume 3+ (function approximation, deep RL): PyTorch or TensorFlow, and Gym or Gymnasium for environments (CartPole, MountainCar, etc.). Editor or IDE: Any text editor or IDE (VS Code, PyCharm, etc.). Jupyter is optional; see the FAQ on “Proof that using Jupyter Notebook is the same as not using it” (you can use scripts or notebooks—both are fine). After setup Once your environment is ready, take the Preliminary assessment to see if you are ready for the curriculum, or follow the Learning path from Phase 0 if you are new to programming.

March 10, 2026 · 2 min · 229 words · codefrydev