Chapter 18: Custom Gym Environments (Part 1)

Learning objectives Create a custom Gymnasium (or Gym) environment: inherit from gym.Env, implement reset, step, and optional render. Define observation_space and action_space (e.g. Discrete(4) for up/down/left/right). Implement a text-based render (e.g. print a grid with agent and goal). Concept and real-world RL Real RL often requires custom environments: simulators for robotics, games, or domain-specific tasks. The Gym API (reset, step, observation_space, action_space) is the standard. Implementing a small maze teaches you how to encode state (e.g. agent position), handle boundaries and obstacles, and return (obs, reward, terminated, truncated, info). In practice, you will wrap or write envs for your problem and reuse the same agents (e.g. Q-learning, DQN) trained on standard envs. ...

March 10, 2026 · 3 min · 556 words · codefrydev

Chapter 63: Curiosity-Driven Exploration (ICM)

Learning objectives Implement the Intrinsic Curiosity Module: a forward model that predicts next-state features from current state and action. Use prediction error (between predicted and actual next features) as intrinsic reward and combine it with A2C. Explain why prediction error encourages exploration in novel or stochastic parts of the state space. Compare exploration behavior (e.g. coverage, time to goal) with and without ICM on a sparse-reward maze. Relate curiosity-driven exploration to robot navigation and game AI where rewards are sparse. Concept and real-world RL ...

March 10, 2026 · 3 min · 624 words · codefrydev

Chapter 66: Go-Explore Algorithm

Learning objectives Implement a simplified Go-Explore: an archive of promising states and a strategy to return to them and explore further. Explain the two-phase idea: (1) archive states that lead to high rewards or novelty, (2) select from the archive, return to that state, then take exploratory actions. Compare Go-Explore with random exploration (e.g. episodes to reach goal, or maximum reward reached) on a deterministic maze. Identify why “return” (resetting to an archived state) helps in hard exploration compared to always starting from the initial state. Relate Go-Explore to game AI (e.g. Montezuma’s Revenge) and robot navigation with sparse goals. Concept and real-world RL ...

March 10, 2026 · 4 min · 754 words · codefrydev