Chapter 1: The Reinforcement Learning Framework
March 10, 2026 · 4 min · 748 words · codefrydev
Course Outline
March 10, 2026 · 5 min · 1002 words · codefrydev
Chapter 2: Multi-Armed Bandits
March 10, 2026 · 4 min · 679 words · codefrydev
Bandits: Optimistic Initial Values
March 10, 2026 · 2 min · 305 words · codefrydev
Chapter 3: Markov Decision Processes (MDPs)
March 10, 2026 · 3 min · 574 words · codefrydev
Bandits: UCB1
March 10, 2026 · 2 min · 319 words · codefrydev
Chapter 4: The Reward Hypothesis
March 10, 2026 · 4 min · 709 words · codefrydev
Gridworld
March 10, 2026 · 2 min · 356 words · codefrydev
Bandits: Thompson Sampling
March 10, 2026 · 2 min · 401 words · codefrydev
Chapter 5: Value Functions
March 10, 2026 · 3 min · 620 words · codefrydev
Choosing Rewards
March 10, 2026 · 2 min · 354 words · codefrydev
Bandits: Nonstationary
March 10, 2026 · 2 min · 363 words · codefrydev
Chapter 6: The Bellman Equations
March 10, 2026 · 3 min · 589 words · codefrydev
Bandits: Why don’t we just use a library?
March 10, 2026 · 2 min · 289 words · codefrydev
Chapter 7: Dynamic Programming — Policy Evaluation
March 10, 2026 · 4 min · 703 words · codefrydev
Chapter 8: Dynamic Programming — Policy Iteration
March 10, 2026 · 4 min · 652 words · codefrydev
Windy Gridworld
March 10, 2026 · 2 min · 392 words · codefrydev
Chapter 9: Dynamic Programming — Value Iteration
March 10, 2026 · 3 min · 624 words · codefrydev
Dynamic Programming: Gridworld in Code
March 10, 2026 · 2 min · 390 words · codefrydev
Chapter 10: Limitations of Dynamic Programming
March 10, 2026 · 4 min · 683 words · codefrydev
Python
March 10, 2026 · 9 min · 1810 words · codefrydev
Chapter 11: Monte Carlo Methods
March 10, 2026 · 4 min · 777 words · codefrydev
Chapter 12: Temporal Difference (TD) Learning
March 10, 2026 · 3 min · 589 words · codefrydev
Monte Carlo in Code
March 10, 2026 · 3 min · 464 words · codefrydev
Chapter 13: SARSA (On-Policy TD Control)
March 10, 2026 · 3 min · 541 words · codefrydev
TD, SARSA, and Q-Learning in Code
March 10, 2026 · 2 min · 351 words · codefrydev
Chapter 14: Q-Learning (Off-Policy TD Control)
March 10, 2026 · 3 min · 589 words · codefrydev
Chapter 15: Expected SARSA
March 10, 2026 · 3 min · 618 words · codefrydev
Chapter 16: N-Step Bootstrapping
March 10, 2026 · 3 min · 557 words · codefrydev
Chapter 17: Planning and Learning with Tabular Methods
March 10, 2026 · 3 min · 583 words · codefrydev
Chapter 18: Custom Gym Environments (Part 1)
March 10, 2026 · 3 min · 556 words · codefrydev
Chapter 19: Hyperparameter Tuning in Tabular RL
March 10, 2026 · 3 min · 608 words · codefrydev
Chapter 20: The Limits of Tabular Methods
March 10, 2026 · 4 min · 645 words · codefrydev
NumPy
March 10, 2026 · 6 min · 1184 words · codefrydev
Chapter 21: Linear Function Approximation
March 10, 2026 · 3 min · 606 words · codefrydev
Feature Engineering for Reinforcement Learning
March 10, 2026 · 2 min · 400 words · codefrydev
CartPole
March 10, 2026 · 3 min · 451 words · codefrydev
Chapter 22: Artificial Neural Networks for RL
March 10, 2026 · 3 min · 555 words · codefrydev
Chapter 23: Deep Q-Networks (DQN)
March 10, 2026 · 3 min · 545 words · codefrydev
Chapter 24: Experience Replay
March 10, 2026 · 3 min · 596 words · codefrydev
Chapter 25: Target Networks
March 10, 2026 · 3 min · 596 words · codefrydev
Chapter 26: Double DQN (DDQN)
March 10, 2026 · 3 min · 523 words · codefrydev
Chapter 27: Dueling DQN
March 10, 2026 · 3 min · 577 words · codefrydev
Chapter 28: Prioritized Experience Replay (PER)
March 10, 2026 · 3 min · 633 words · codefrydev
Chapter 29: Noisy Networks for Exploration
March 10, 2026 · 4 min · 642 words · codefrydev
Chapter 30: Rainbow DQN
March 10, 2026 · 3 min · 586 words · codefrydev
Pandas
March 10, 2026 · 4 min · 764 words · codefrydev
Chapter 31: Introduction to Policy-Based Methods
March 10, 2026 · 3 min · 547 words · codefrydev
Chapter 32: The Policy Objective Function
March 10, 2026 · 3 min · 585 words · codefrydev
Chapter 33: The REINFORCE Algorithm
March 10, 2026 · 3 min · 602 words · codefrydev
Chapter 34: Reducing Variance in Policy Gradients
March 10, 2026 · 3 min · 593 words · codefrydev
Chapter 35: Actor-Critic Architectures
March 10, 2026 · 3 min · 577 words · codefrydev
Visualization & Plotting for RL
March 10, 2026 · 5 min · 889 words · codefrydev
Chapter 36: Advantage Actor-Critic (A2C)
March 10, 2026 · 3 min · 566 words · codefrydev
Chapter 37: Asynchronous Advantage Actor-Critic (A3C)
March 10, 2026 · 3 min · 556 words · codefrydev
Chapter 38: Continuous Action Spaces
March 10, 2026 · 3 min · 533 words · codefrydev
Chapter 39: Deep Deterministic Policy Gradient (DDPG)
March 10, 2026 · 3 min · 524 words · codefrydev
Chapter 40: Twin Delayed DDPG (TD3)
March 10, 2026 · 3 min · 555 words · codefrydev
Matplotlib
March 10, 2026 · 4 min · 803 words · codefrydev
Chapter 41: The Problem with Standard Policy Gradients
March 10, 2026 · 3 min · 563 words · codefrydev
Chapter 42: Trust Region Policy Optimization (TRPO)
March 10, 2026 · 3 min · 551 words · codefrydev
Chapter 43: Proximal Policy Optimization (PPO): Intuition
March 10, 2026 · 3 min · 540 words · codefrydev
Chapter 44: PPO: Implementation Details
March 10, 2026 · 3 min · 482 words · codefrydev
Chapter 45: Coding PPO from Scratch
March 10, 2026 · 3 min · 532 words · codefrydev
Chapter 46: Maximum Entropy RL
March 10, 2026 · 3 min · 500 words · codefrydev
Chapter 47: Soft Actor-Critic (SAC)
March 10, 2026 · 3 min · 519 words · codefrydev
Chapter 48: SAC vs. PPO
March 10, 2026 · 3 min · 481 words · codefrydev
Chapter 49: Custom Gym Environments (Part 2)
March 10, 2026 · 3 min · 525 words · codefrydev
Chapter 50: Advanced Hyperparameter Tuning
March 10, 2026 · 3 min · 473 words · codefrydev
PyTorch
March 10, 2026 · 5 min · 1052 words · codefrydev
Chapter 51: Model-Free vs. Model-Based RL
March 10, 2026 · 3 min · 446 words · codefrydev
Chapter 52: Learning World Models
March 10, 2026 · 3 min · 442 words · codefrydev
Chapter 53: Planning with Known Models
March 10, 2026 · 3 min · 443 words · codefrydev
Chapter 54: Monte Carlo Tree Search (MCTS)
March 10, 2026 · 3 min · 444 words · codefrydev
Chapter 55: AlphaZero Architecture
March 10, 2026 · 3 min · 460 words · codefrydev
Chapter 56: MuZero Intuition
March 10, 2026 · 3 min · 468 words · codefrydev
Chapter 57: Dreamer and Latent Imagination
March 10, 2026 · 3 min · 464 words · codefrydev
Chapter 58: Model-Based Policy Optimization (MBPO)
March 10, 2026 · 3 min · 475 words · codefrydev
Chapter 59: Probabilistic Ensembles with Trajectory Sampling (PETS)
March 10, 2026 · 3 min · 494 words · codefrydev
Chapter 60: Visualizing Model-Based Rollouts
March 10, 2026 · 3 min · 466 words · codefrydev
TensorFlow
March 10, 2026 · 4 min · 782 words · codefrydev
Chapter 61: The Hard Exploration Problem
March 10, 2026 · 3 min · 489 words · codefrydev
Chapter 62: Intrinsic Motivation
March 10, 2026 · 3 min · 487 words · codefrydev
Chapter 63: Curiosity-Driven Exploration (ICM)
March 10, 2026 · 3 min · 624 words · codefrydev
Chapter 64: Random Network Distillation (RND)
March 10, 2026 · 3 min · 628 words · codefrydev
Chapter 65: Count-Based Exploration
March 10, 2026 · 4 min · 643 words · codefrydev
Chapter 66: Go-Explore Algorithm
March 10, 2026 · 4 min · 754 words · codefrydev
Chapter 67: Meta-Learning (Learning to Learn)
March 10, 2026 · 4 min · 714 words · codefrydev
Chapter 68: Model-Agnostic Meta-Learning (MAML) in RL
March 10, 2026 · 3 min · 636 words · codefrydev
Chapter 69: RL² (Reinforcement Learning as an RNN)
March 10, 2026 · 4 min · 707 words · codefrydev
Chapter 70: Unsupervised Environment Design
March 10, 2026 · 4 min · 734 words · codefrydev
OpenAI Gym / Gymnasium
March 10, 2026 · 5 min · 929 words · codefrydev
Chapter 71: The Offline RL Problem
March 10, 2026 · 4 min · 723 words · codefrydev
Chapter 72: Conservative Q-Learning (CQL)
March 10, 2026 · 4 min · 684 words · codefrydev
Chapter 73: Decision Transformers
March 10, 2026 · 4 min · 716 words · codefrydev
Chapter 74: Introduction to Imitation Learning
March 10, 2026 · 3 min · 626 words · codefrydev
Chapter 75: Limitations of Behavioral Cloning
March 10, 2026 · 4 min · 807 words · codefrydev
Chapter 76: Inverse Reinforcement Learning (IRL)
March 10, 2026 · 4 min · 762 words · codefrydev
Chapter 77: Generative Adversarial Imitation Learning (GAIL)
March 10, 2026 · 4 min · 704 words · codefrydev
Chapter 78: Adversarial Motion Priors (AMP)
March 10, 2026 · 4 min · 717 words · codefrydev
Chapter 79: Offline-to-Online Finetuning
March 10, 2026 · 4 min · 756 words · codefrydev
Chapter 80: RL from Human Feedback (RLHF) Basics
March 10, 2026 · 4 min · 708 words · codefrydev
Other Libraries
March 10, 2026 · 4 min · 654 words · codefrydev
Chapter 81: Multi-Agent Fundamentals
March 10, 2026 · 4 min · 673 words · codefrydev
Chapter 82: Game Theory Basics for RL
March 10, 2026 · 4 min · 672 words · codefrydev
Chapter 83: Independent Q-Learning (IQL)
March 10, 2026 · 4 min · 715 words · codefrydev
Chapter 84: Centralized Training, Decentralized Execution (CTDE)
March 10, 2026 · 4 min · 754 words · codefrydev
Chapter 85: Multi-Agent DDPG (MADDPG)
March 10, 2026 · 4 min · 652 words · codefrydev
Chapter 86: Value Decomposition Networks (VDN)
March 10, 2026 · 4 min · 684 words · codefrydev
Chapter 87: QMIX Algorithm
March 10, 2026 · 4 min · 664 words · codefrydev
Chapter 88: Multi-Agent PPO (MAPPO)
March 10, 2026 · 4 min · 671 words · codefrydev
Chapter 89: Self-Play and League Training
March 10, 2026 · 4 min · 741 words · codefrydev
Chapter 90: Communication in MARL
March 10, 2026 · 4 min · 729 words · codefrydev
Chapter 91: RL in Robotics
March 10, 2026 · 4 min · 677 words · codefrydev
Chapter 92: Safe Reinforcement Learning
March 10, 2026 · 4 min · 707 words · codefrydev
Chapter 93: RL for Algorithmic Trading
March 10, 2026 · 4 min · 652 words · codefrydev
Chapter 94: RL in Recommender Systems
March 10, 2026 · 4 min · 698 words · codefrydev
Chapter 95: Training Large Language Models with PPO
March 10, 2026 · 4 min · 730 words · codefrydev
Chapter 96: Implementing RLHF in NLP
March 10, 2026 · 4 min · 705 words · codefrydev
Chapter 97: Direct Preference Optimization (DPO)
March 10, 2026 · 4 min · 670 words · codefrydev
Chapter 98: Evaluating RL Agents
March 10, 2026 · 4 min · 695 words · codefrydev
Chapter 99: Debugging RL Code
March 10, 2026 · 4 min · 728 words · codefrydev
Chapter 100: The Future of Reinforcement Learning
March 10, 2026 · 4 min · 711 words · codefrydev
Anaconda Environment Setup
March 10, 2026 · 2 min · 237 words · codefrydev
Calculus
March 10, 2026 · 8 min · 1554 words · codefrydev
Calculus
March 10, 2026 · 4 min · 793 words · codefrydev
Effective Learning Strategies for Machine Learning
March 10, 2026 · 2 min · 292 words · codefrydev
Final Self-Assessment
March 10, 2026 · 3 min · 448 words · codefrydev
Function Approximation and Deep RL
March 10, 2026 · 7 min · 1400 words · codefrydev
How to Code by Yourself (part 1)
March 10, 2026 · 2 min · 312 words · codefrydev
How to Code by Yourself (part 2)
March 10, 2026 · 2 min · 346 words · codefrydev
How to Install Numpy, Scipy, Matplotlib, Pandas, IPython, Theano, and TensorFlow
March 10, 2026 · 2 min · 279 words · codefrydev
How to Succeed in this Course
March 10, 2026 · 1 min · 208 words · codefrydev
How to Succeed in this Course (Long Version)
March 10, 2026 · 2 min · 406 words · codefrydev
Is this for Beginners or Experts? Academic or Practical? Fast or slow-paced?
March 10, 2026 · 2 min · 378 words · codefrydev
Linear Algebra
March 10, 2026 · 9 min · 1736 words · codefrydev
Linear Algebra
March 10, 2026 · 5 min · 922 words · codefrydev
Machine Learning and AI Prerequisite Roadmap (pt 1–2)
March 10, 2026 · 2 min · 320 words · codefrydev
NumPy
March 10, 2026 · 4 min · 793 words · codefrydev
Phase 1 Self-Check: Math for RL
March 10, 2026 · 5 min · 858 words · codefrydev
Phase 2 Readiness Quiz
March 10, 2026 · 4 min · 656 words · codefrydev
Phase 3 Foundations Quiz
March 10, 2026 · 5 min · 876 words · codefrydev
Phase 4 Deep RL Quiz
March 10, 2026 · 4 min · 814 words · codefrydev
Probability & Statistics
March 10, 2026 · 8 min · 1699 words · codefrydev
Probability & Statistics
March 10, 2026 · 5 min · 1062 words · codefrydev
Python Basics
March 10, 2026 · 5 min · 853 words · codefrydev
PyTorch Basics
March 10, 2026 · 5 min · 926 words · codefrydev
Real-World Scenarios in This Curriculum
March 10, 2026 · 3 min · 563 words · codefrydev
RL Framework
March 10, 2026 · 6 min · 1198 words · codefrydev
Setting Up Your Environment
March 10, 2026 · 2 min · 229 words · codefrydev
Stock Trading Project with Reinforcement Learning
March 10, 2026 · 4 min · 717 words · codefrydev
Tabular Methods
March 10, 2026 · 6 min · 1277 words · codefrydev
This Course vs. RL Book: What’s the Difference?
March 10, 2026 · 2 min · 405 words · codefrydev
Value Functions and Bellman Equation
March 10, 2026 · 5 min · 906 words · codefrydev
Where to Get the Code
March 10, 2026 · 2 min · 240 words · codefrydev
Worked Solutions Index
March 10, 2026 · 2 min · 285 words · codefrydev