2026  156

March  156

Chapter 1: The Reinforcement Learning Framework

March 10, 2026 · 4 min · 748 words · codefrydev

Course Outline

March 10, 2026 · 5 min · 1002 words · codefrydev

Chapter 2: Multi-Armed Bandits

March 10, 2026 · 4 min · 679 words · codefrydev

Bandits: Optimistic Initial Values

March 10, 2026 · 2 min · 305 words · codefrydev

Chapter 3: Markov Decision Processes (MDPs)

March 10, 2026 · 3 min · 574 words · codefrydev

Bandits: UCB1

March 10, 2026 · 2 min · 319 words · codefrydev

Chapter 4: The Reward Hypothesis

March 10, 2026 · 4 min · 709 words · codefrydev

Gridworld

March 10, 2026 · 2 min · 356 words · codefrydev

Bandits: Thompson Sampling

March 10, 2026 · 2 min · 401 words · codefrydev

Chapter 5: Value Functions

March 10, 2026 · 3 min · 620 words · codefrydev

Choosing Rewards

March 10, 2026 · 2 min · 354 words · codefrydev

Bandits: Nonstationary

March 10, 2026 · 2 min · 363 words · codefrydev

Chapter 6: The Bellman Equations

March 10, 2026 · 3 min · 589 words · codefrydev

Bandits: Why don’t we just use a library?

March 10, 2026 · 2 min · 289 words · codefrydev

Chapter 7: Dynamic Programming — Policy Evaluation

March 10, 2026 · 4 min · 703 words · codefrydev

Chapter 8: Dynamic Programming — Policy Iteration

March 10, 2026 · 4 min · 652 words · codefrydev

Windy Gridworld

March 10, 2026 · 2 min · 392 words · codefrydev

Chapter 9: Dynamic Programming — Value Iteration

March 10, 2026 · 3 min · 624 words · codefrydev

Dynamic Programming: Gridworld in Code

March 10, 2026 · 2 min · 390 words · codefrydev

Chapter 10: Limitations of Dynamic Programming

March 10, 2026 · 4 min · 683 words · codefrydev

Python

March 10, 2026 · 9 min · 1810 words · codefrydev

Chapter 11: Monte Carlo Methods

March 10, 2026 · 4 min · 777 words · codefrydev

Chapter 12: Temporal Difference (TD) Learning

March 10, 2026 · 3 min · 589 words · codefrydev

Monte Carlo in Code

March 10, 2026 · 3 min · 464 words · codefrydev

Chapter 13: SARSA (On-Policy TD Control)

March 10, 2026 · 3 min · 541 words · codefrydev

TD, SARSA, and Q-Learning in Code

March 10, 2026 · 2 min · 351 words · codefrydev

Chapter 14: Q-Learning (Off-Policy TD Control)

March 10, 2026 · 3 min · 589 words · codefrydev

Chapter 15: Expected SARSA

March 10, 2026 · 3 min · 618 words · codefrydev

Chapter 16: N-Step Bootstrapping

March 10, 2026 · 3 min · 557 words · codefrydev

Chapter 17: Planning and Learning with Tabular Methods

March 10, 2026 · 3 min · 583 words · codefrydev

Chapter 18: Custom Gym Environments (Part 1)

March 10, 2026 · 3 min · 556 words · codefrydev

Chapter 19: Hyperparameter Tuning in Tabular RL

March 10, 2026 · 3 min · 608 words · codefrydev

Chapter 20: The Limits of Tabular Methods

March 10, 2026 · 4 min · 645 words · codefrydev

NumPy

March 10, 2026 · 6 min · 1184 words · codefrydev

Chapter 21: Linear Function Approximation

March 10, 2026 · 3 min · 606 words · codefrydev

Feature Engineering for Reinforcement Learning

March 10, 2026 · 2 min · 400 words · codefrydev

CartPole

March 10, 2026 · 3 min · 451 words · codefrydev

Chapter 22: Artificial Neural Networks for RL

March 10, 2026 · 3 min · 555 words · codefrydev

Chapter 23: Deep Q-Networks (DQN)

March 10, 2026 · 3 min · 545 words · codefrydev

Chapter 24: Experience Replay

March 10, 2026 · 3 min · 596 words · codefrydev

Chapter 25: Target Networks

March 10, 2026 · 3 min · 596 words · codefrydev

Chapter 26: Double DQN (DDQN)

March 10, 2026 · 3 min · 523 words · codefrydev

Chapter 27: Dueling DQN

March 10, 2026 · 3 min · 577 words · codefrydev

Chapter 28: Prioritized Experience Replay (PER)

March 10, 2026 · 3 min · 633 words · codefrydev

Chapter 29: Noisy Networks for Exploration

March 10, 2026 · 4 min · 642 words · codefrydev

Chapter 30: Rainbow DQN

March 10, 2026 · 3 min · 586 words · codefrydev

Pandas

March 10, 2026 · 4 min · 764 words · codefrydev

Chapter 31: Introduction to Policy-Based Methods

March 10, 2026 · 3 min · 547 words · codefrydev

Chapter 32: The Policy Objective Function

March 10, 2026 · 3 min · 585 words · codefrydev

Chapter 33: The REINFORCE Algorithm

March 10, 2026 · 3 min · 602 words · codefrydev

Chapter 34: Reducing Variance in Policy Gradients

March 10, 2026 · 3 min · 593 words · codefrydev

Chapter 35: Actor-Critic Architectures

March 10, 2026 · 3 min · 577 words · codefrydev

Visualization & Plotting for RL

March 10, 2026 · 5 min · 889 words · codefrydev

Chapter 36: Advantage Actor-Critic (A2C)

March 10, 2026 · 3 min · 566 words · codefrydev

Chapter 37: Asynchronous Advantage Actor-Critic (A3C)

March 10, 2026 · 3 min · 556 words · codefrydev

Chapter 38: Continuous Action Spaces

March 10, 2026 · 3 min · 533 words · codefrydev

Chapter 39: Deep Deterministic Policy Gradient (DDPG)

March 10, 2026 · 3 min · 524 words · codefrydev

Chapter 40: Twin Delayed DDPG (TD3)

March 10, 2026 · 3 min · 555 words · codefrydev

Matplotlib

March 10, 2026 · 4 min · 803 words · codefrydev

Chapter 41: The Problem with Standard Policy Gradients

March 10, 2026 · 3 min · 563 words · codefrydev

Chapter 42: Trust Region Policy Optimization (TRPO)

March 10, 2026 · 3 min · 551 words · codefrydev

Chapter 43: Proximal Policy Optimization (PPO): Intuition

March 10, 2026 · 3 min · 540 words · codefrydev

Chapter 44: PPO: Implementation Details

March 10, 2026 · 3 min · 482 words · codefrydev

Chapter 45: Coding PPO from Scratch

March 10, 2026 · 3 min · 532 words · codefrydev

Chapter 46: Maximum Entropy RL

March 10, 2026 · 3 min · 500 words · codefrydev

Chapter 47: Soft Actor-Critic (SAC)

March 10, 2026 · 3 min · 519 words · codefrydev

Chapter 48: SAC vs. PPO

March 10, 2026 · 3 min · 481 words · codefrydev

Chapter 49: Custom Gym Environments (Part 2)

March 10, 2026 · 3 min · 525 words · codefrydev

Chapter 50: Advanced Hyperparameter Tuning

March 10, 2026 · 3 min · 473 words · codefrydev

PyTorch

March 10, 2026 · 5 min · 1052 words · codefrydev

Chapter 51: Model-Free vs. Model-Based RL

March 10, 2026 · 3 min · 446 words · codefrydev

Chapter 52: Learning World Models

March 10, 2026 · 3 min · 442 words · codefrydev

Chapter 53: Planning with Known Models

March 10, 2026 · 3 min · 443 words · codefrydev

Chapter 54: Monte Carlo Tree Search (MCTS)

March 10, 2026 · 3 min · 444 words · codefrydev

Chapter 55: AlphaZero Architecture

March 10, 2026 · 3 min · 460 words · codefrydev

Chapter 56: MuZero Intuition

March 10, 2026 · 3 min · 468 words · codefrydev

Chapter 57: Dreamer and Latent Imagination

March 10, 2026 · 3 min · 464 words · codefrydev

Chapter 58: Model-Based Policy Optimization (MBPO)

March 10, 2026 · 3 min · 475 words · codefrydev

Chapter 59: Probabilistic Ensembles with Trajectory Sampling (PETS)

March 10, 2026 · 3 min · 494 words · codefrydev

Chapter 60: Visualizing Model-Based Rollouts

March 10, 2026 · 3 min · 466 words · codefrydev

TensorFlow

March 10, 2026 · 4 min · 782 words · codefrydev

Chapter 61: The Hard Exploration Problem

March 10, 2026 · 3 min · 489 words · codefrydev

Chapter 62: Intrinsic Motivation

March 10, 2026 · 3 min · 487 words · codefrydev

Chapter 63: Curiosity-Driven Exploration (ICM)

March 10, 2026 · 3 min · 624 words · codefrydev

Chapter 64: Random Network Distillation (RND)

March 10, 2026 · 3 min · 628 words · codefrydev

Chapter 65: Count-Based Exploration

March 10, 2026 · 4 min · 643 words · codefrydev

Chapter 66: Go-Explore Algorithm

March 10, 2026 · 4 min · 754 words · codefrydev

Chapter 67: Meta-Learning (Learning to Learn)

March 10, 2026 · 4 min · 714 words · codefrydev

Chapter 68: Model-Agnostic Meta-Learning (MAML) in RL

March 10, 2026 · 3 min · 636 words · codefrydev

Chapter 69: RL² (Reinforcement Learning as an RNN)

March 10, 2026 · 4 min · 707 words · codefrydev

Chapter 70: Unsupervised Environment Design

March 10, 2026 · 4 min · 734 words · codefrydev

OpenAI Gym / Gymnasium

March 10, 2026 · 5 min · 929 words · codefrydev

Chapter 71: The Offline RL Problem

March 10, 2026 · 4 min · 723 words · codefrydev

Chapter 72: Conservative Q-Learning (CQL)

March 10, 2026 · 4 min · 684 words · codefrydev

Chapter 73: Decision Transformers

March 10, 2026 · 4 min · 716 words · codefrydev

Chapter 74: Introduction to Imitation Learning

March 10, 2026 · 3 min · 626 words · codefrydev

Chapter 75: Limitations of Behavioral Cloning

March 10, 2026 · 4 min · 807 words · codefrydev

Chapter 76: Inverse Reinforcement Learning (IRL)

March 10, 2026 · 4 min · 762 words · codefrydev

Chapter 77: Generative Adversarial Imitation Learning (GAIL)

March 10, 2026 · 4 min · 704 words · codefrydev

Chapter 78: Adversarial Motion Priors (AMP)

March 10, 2026 · 4 min · 717 words · codefrydev

Chapter 79: Offline-to-Online Finetuning

March 10, 2026 · 4 min · 756 words · codefrydev

Chapter 80: RL from Human Feedback (RLHF) Basics

March 10, 2026 · 4 min · 708 words · codefrydev

Other Libraries

March 10, 2026 · 4 min · 654 words · codefrydev

Chapter 81: Multi-Agent Fundamentals

March 10, 2026 · 4 min · 673 words · codefrydev

Chapter 82: Game Theory Basics for RL

March 10, 2026 · 4 min · 672 words · codefrydev

Chapter 83: Independent Q-Learning (IQL)

March 10, 2026 · 4 min · 715 words · codefrydev

Chapter 84: Centralized Training, Decentralized Execution (CTDE)

March 10, 2026 · 4 min · 754 words · codefrydev

Chapter 85: Multi-Agent DDPG (MADDPG)

March 10, 2026 · 4 min · 652 words · codefrydev

Chapter 86: Value Decomposition Networks (VDN)

March 10, 2026 · 4 min · 684 words · codefrydev

Chapter 87: QMIX Algorithm

March 10, 2026 · 4 min · 664 words · codefrydev

Chapter 88: Multi-Agent PPO (MAPPO)

March 10, 2026 · 4 min · 671 words · codefrydev

Chapter 89: Self-Play and League Training

March 10, 2026 · 4 min · 741 words · codefrydev

Chapter 90: Communication in MARL

March 10, 2026 · 4 min · 729 words · codefrydev

Chapter 91: RL in Robotics

March 10, 2026 · 4 min · 677 words · codefrydev

Chapter 92: Safe Reinforcement Learning

March 10, 2026 · 4 min · 707 words · codefrydev

Chapter 93: RL for Algorithmic Trading

March 10, 2026 · 4 min · 652 words · codefrydev

Chapter 94: RL in Recommender Systems

March 10, 2026 · 4 min · 698 words · codefrydev

Chapter 95: Training Large Language Models with PPO

March 10, 2026 · 4 min · 730 words · codefrydev

Chapter 96: Implementing RLHF in NLP

March 10, 2026 · 4 min · 705 words · codefrydev

Chapter 97: Direct Preference Optimization (DPO)

March 10, 2026 · 4 min · 670 words · codefrydev

Chapter 98: Evaluating RL Agents

March 10, 2026 · 4 min · 695 words · codefrydev

Chapter 99: Debugging RL Code

March 10, 2026 · 4 min · 728 words · codefrydev

Chapter 100: The Future of Reinforcement Learning

March 10, 2026 · 4 min · 711 words · codefrydev

Anaconda Environment Setup

March 10, 2026 · 2 min · 237 words · codefrydev

Calculus

March 10, 2026 · 8 min · 1554 words · codefrydev

Calculus

March 10, 2026 · 4 min · 793 words · codefrydev

Effective Learning Strategies for Machine Learning

March 10, 2026 · 2 min · 292 words · codefrydev

Final Self-Assessment

March 10, 2026 · 3 min · 448 words · codefrydev

Function Approximation and Deep RL

March 10, 2026 · 7 min · 1400 words · codefrydev

How to Code by Yourself (part 1)

March 10, 2026 · 2 min · 312 words · codefrydev

How to Code by Yourself (part 2)

March 10, 2026 · 2 min · 346 words · codefrydev

How to Install Numpy, Scipy, Matplotlib, Pandas, IPython, Theano, and TensorFlow

March 10, 2026 · 2 min · 279 words · codefrydev

How to Succeed in this Course

March 10, 2026 · 1 min · 208 words · codefrydev

How to Succeed in this Course (Long Version)

March 10, 2026 · 2 min · 406 words · codefrydev

Is this for Beginners or Experts? Academic or Practical? Fast or slow-paced?

March 10, 2026 · 2 min · 378 words · codefrydev

Linear Algebra

March 10, 2026 · 9 min · 1736 words · codefrydev

Linear Algebra

March 10, 2026 · 5 min · 922 words · codefrydev

Machine Learning and AI Prerequisite Roadmap (pt 1–2)

March 10, 2026 · 2 min · 320 words · codefrydev

NumPy

March 10, 2026 · 4 min · 793 words · codefrydev

Phase 1 Self-Check: Math for RL

March 10, 2026 · 5 min · 858 words · codefrydev

Phase 2 Readiness Quiz

March 10, 2026 · 4 min · 656 words · codefrydev

Phase 3 Foundations Quiz

March 10, 2026 · 5 min · 876 words · codefrydev

Phase 4 Deep RL Quiz

March 10, 2026 · 4 min · 814 words · codefrydev

Probability & Statistics

March 10, 2026 · 8 min · 1699 words · codefrydev

Probability & Statistics

March 10, 2026 · 5 min · 1062 words · codefrydev

Python Basics

March 10, 2026 · 5 min · 853 words · codefrydev

PyTorch Basics

March 10, 2026 · 5 min · 926 words · codefrydev

Real-World Scenarios in This Curriculum

March 10, 2026 · 3 min · 563 words · codefrydev

RL Framework

March 10, 2026 · 6 min · 1198 words · codefrydev

Setting Up Your Environment

March 10, 2026 · 2 min · 229 words · codefrydev

Stock Trading Project with Reinforcement Learning

March 10, 2026 · 4 min · 717 words · codefrydev

Tabular Methods

March 10, 2026 · 6 min · 1277 words · codefrydev

This Course vs. RL Book: What’s the Difference?

March 10, 2026 · 2 min · 405 words · codefrydev

Value Functions and Bellman Equation

March 10, 2026 · 5 min · 906 words · codefrydev

Where to Get the Code

March 10, 2026 · 2 min · 240 words · codefrydev

Worked Solutions Index

March 10, 2026 · 2 min · 285 words · codefrydev