# Reinforcement Learning Curriculum > Reinforcement learning curriculum by codefrydev — 100 chapters, prerequisites, and readiness assessments. Hosted at codefrydev.in. - [Course Outline](https://codefrydev.in/Reinforcement/course-outline/) - [Reinforcement learning glossary — terms, definitions, and chapter links](https://codefrydev.in/Reinforcement/glossary/) - [How to Succeed in this Course](https://codefrydev.in/Reinforcement/how-to-succeed/) - [Stock Trading Project with Reinforcement Learning](https://codefrydev.in/Reinforcement/stock-trading/) - [This Course vs. RL Book: What's the Difference?](https://codefrydev.in/Reinforcement/course-vs-book/) - [Where to Get the Code](https://codefrydev.in/Reinforcement/where-to-get-the-code/) - [Worked Solutions Index](https://codefrydev.in/Reinforcement/solutions-index/) ## Assessments - [Phase 0 Assessment: Python Basics](https://codefrydev.in/Reinforcement/assessment/phase-0-programming/) - [Phase 1 Self-Check: Math for RL](https://codefrydev.in/Reinforcement/assessment/phase-1-math/) - [Phase 2 Readiness Quiz](https://codefrydev.in/Reinforcement/assessment/phase-2-readiness/) - [Checkpoint: ML Foundations Mid-Point](https://codefrydev.in/Reinforcement/assessment/checkpoint-ml-mid/) - [Phase 4 Assessment: Machine Learning Foundations](https://codefrydev.in/Reinforcement/assessment/phase-4-ml/) - [Checkpoint: DL Foundations Mid-Point](https://codefrydev.in/Reinforcement/assessment/checkpoint-dl-mid/) - [Phase 5 Assessment: Deep Learning Foundations](https://codefrydev.in/Reinforcement/assessment/phase-5-dl/) - [Checkpoint: Volume 1, Midpoint (After Chapter 5)](https://codefrydev.in/Reinforcement/assessment/checkpoint-vol-01-mid/) - [Checkpoint: Volume 2, Midpoint (After Chapter 15)](https://codefrydev.in/Reinforcement/assessment/checkpoint-vol-02-mid/) - [Phase 6 Assessment: RL Foundations](https://codefrydev.in/Reinforcement/assessment/phase-3-foundations/) - [Checkpoint: Volume 3, Midpoint (After Chapter 25)](https://codefrydev.in/Reinforcement/assessment/checkpoint-vol-03-mid/) - [Checkpoint: Volume 4, Midpoint (After Chapter 35)](https://codefrydev.in/Reinforcement/assessment/checkpoint-vol-04-mid/) - [Phase 7 Assessment: Deep RL](https://codefrydev.in/Reinforcement/assessment/phase-4-deep-rl/) - [Checkpoint: Volume 5, Midpoint (After Chapter 45)](https://codefrydev.in/Reinforcement/assessment/checkpoint-vol-05-mid/) - [Phase 8 Assessment: Advanced RL](https://codefrydev.in/Reinforcement/assessment/phase-5-advanced/) ## Appendix — practical guides for reinforcement learning - [Is this for Beginners or Experts? Academic or Practical? Fast or slow-paced?](https://codefrydev.in/Reinforcement/appendix/beginners-or-experts/) - [How to Succeed in this Course (Long Version)](https://codefrydev.in/Reinforcement/appendix/how-to-succeed-long/) - [Effective Learning Strategies for Machine Learning](https://codefrydev.in/Reinforcement/appendix/effective-learning-strategies/) - [Machine Learning and AI Prerequisite Roadmap (pt 1–2)](https://codefrydev.in/Reinforcement/appendix/prerequisite-roadmap/) - [Anaconda Environment Setup](https://codefrydev.in/Reinforcement/appendix/anaconda-setup/) - [Setting Up Your Environment](https://codefrydev.in/Reinforcement/appendix/setting-up-environment/) - [How to Install Numpy, Scipy, Matplotlib, Pandas, IPython, Theano, and TensorFlow](https://codefrydev.in/Reinforcement/appendix/installing-libraries/) - [How to Code by Yourself (part 1)](https://codefrydev.in/Reinforcement/appendix/how-to-code-by-yourself-1/) - [How to Code by Yourself (part 2)](https://codefrydev.in/Reinforcement/appendix/how-to-code-by-yourself-2/) - [How to Debug RL Code](https://codefrydev.in/Reinforcement/appendix/debugging-rl-code/) - [How to Read RL Papers](https://codefrydev.in/Reinforcement/appendix/reading-rl-papers/) ## Deep Learning Foundations - [Biological Inspiration: From Brain Neurons to Artificial Neurons](https://codefrydev.in/Reinforcement/dl-foundations/biological-inspiration/) - [The Perceptron: Learning from Mistakes](https://codefrydev.in/Reinforcement/dl-foundations/perceptron/) - [Activation Functions: Adding Non-Linearity](https://codefrydev.in/Reinforcement/dl-foundations/activation-functions/) - [Multi-Layer Perceptrons: Stacking Layers to Break Linearity](https://codefrydev.in/Reinforcement/dl-foundations/mlp/) - [Forward Propagation: Computing the Network Output](https://codefrydev.in/Reinforcement/dl-foundations/forward-propagation/) - [Loss Functions: Measuring How Wrong the Network Is](https://codefrydev.in/Reinforcement/dl-foundations/loss-functions-dl/) - [Backpropagation: Teaching Networks by Propagating Errors](https://codefrydev.in/Reinforcement/dl-foundations/backpropagation/) - [Optimizers: SGD, Momentum, and Adam](https://codefrydev.in/Reinforcement/dl-foundations/optimizers/) - [The Training Loop](https://codefrydev.in/Reinforcement/dl-foundations/training-loop/) - [Regularization and Overfitting](https://codefrydev.in/Reinforcement/dl-foundations/regularization/) - [CNN Basics: Convolutions and Pooling](https://codefrydev.in/Reinforcement/dl-foundations/cnn-basics/) - [PyTorch: Building Neural Networks with nn.Module](https://codefrydev.in/Reinforcement/dl-foundations/pytorch-nn-practice/) - [DL Mini-Project: Digits Classifier in NumPy](https://codefrydev.in/Reinforcement/dl-foundations/dl-mini-project/) - [DL Foundations Drills](https://codefrydev.in/Reinforcement/dl-foundations/drills/) - [DL Foundations Review & Bridge to RL](https://codefrydev.in/Reinforcement/dl-foundations/review-and-bridge/) ## Machine Learning Foundations - [What is Machine Learning?](https://codefrydev.in/Reinforcement/ml-foundations/what-is-ml/) - [Datasets and Features](https://codefrydev.in/Reinforcement/ml-foundations/datasets-and-features/) - [Linear Regression](https://codefrydev.in/Reinforcement/ml-foundations/linear-regression/) - [Gradient Descent](https://codefrydev.in/Reinforcement/ml-foundations/gradient-descent/) - [Multiple Regression](https://codefrydev.in/Reinforcement/ml-foundations/multiple-regression/) - [Classification Concepts](https://codefrydev.in/Reinforcement/ml-foundations/classification-concepts/) - [Logistic Regression](https://codefrydev.in/Reinforcement/ml-foundations/logistic-regression/) - [Model Evaluation](https://codefrydev.in/Reinforcement/ml-foundations/model-evaluation/) - [Cross-Validation and Overfitting](https://codefrydev.in/Reinforcement/ml-foundations/cross-validation/) - [K-Nearest Neighbors](https://codefrydev.in/Reinforcement/ml-foundations/knn/) - [Decision Trees](https://codefrydev.in/Reinforcement/ml-foundations/decision-trees/) - [K-Means Clustering](https://codefrydev.in/Reinforcement/ml-foundations/clustering/) - [Scikit-Learn Workflow](https://codefrydev.in/Reinforcement/ml-foundations/sklearn-workflow/) - [ML Mini-Project: Wine Classification](https://codefrydev.in/Reinforcement/ml-foundations/ml-mini-project/) - [ML Foundations Drills](https://codefrydev.in/Reinforcement/ml-foundations/drills/) - [ML Foundations Review & Bridge to Deep Learning](https://codefrydev.in/Reinforcement/ml-foundations/review-and-bridge/) ## Learning Path: Zero to Reinforcement Learning - [RL in Plain English](https://codefrydev.in/Reinforcement/learning-path/rl-in-plain-english/) - [Bridge Exercises: Python + Math + RL](https://codefrydev.in/Reinforcement/learning-path/bridge-exercises/) - [Deep Reinforcement Learning (module view)](https://codefrydev.in/Reinforcement/learning-path/deep-rl-module-deep-dive/) - [Real-World Scenarios in This Curriculum](https://codefrydev.in/Reinforcement/learning-path/real-world-anchors/) ### Phase 0: Programming from Zero - [Python Confidence Builder](https://codefrydev.in/Reinforcement/learning-path/phase-0/python-confidence/) ### Learning path modules (interactive hubs) - [Phase 0 — Programming from zero](https://codefrydev.in/Reinforcement/learning-path/modules/phase-0/) - [Phase 1 — Math foundations for RL](https://codefrydev.in/Reinforcement/learning-path/modules/phase-1/) - [Phase 2 — Prerequisites (tools and libraries)](https://codefrydev.in/Reinforcement/learning-path/modules/phase-2/) - [Phase 3 — Math for RL (deep dive)](https://codefrydev.in/Reinforcement/learning-path/modules/phase-3/) - [Phase 4 — ML foundations](https://codefrydev.in/Reinforcement/learning-path/modules/phase-4/) - [Phase 5 — DL foundations](https://codefrydev.in/Reinforcement/learning-path/modules/phase-5/) - [Phase 6 — RL foundations (tabular)](https://codefrydev.in/Reinforcement/learning-path/modules/phase-6/) - [Phase 7 — Deep RL](https://codefrydev.in/Reinforcement/learning-path/modules/phase-7/) - [Phase 8 — Advanced topics](https://codefrydev.in/Reinforcement/learning-path/modules/phase-8/) ## Math for reinforcement learning — probability, linear algebra, calculus - [Probability & Statistics](https://codefrydev.in/Reinforcement/math-for-rl/probability/) - [Statistics for RL](https://codefrydev.in/Reinforcement/math-for-rl/statistics/) - [Linear Algebra](https://codefrydev.in/Reinforcement/math-for-rl/linear-algebra/) - [Calculus](https://codefrydev.in/Reinforcement/math-for-rl/calculus/) ## Preliminary Assessment - [Python basics for RL and the preliminary assessment](https://codefrydev.in/Reinforcement/preliminary/python-basics/) - [NumPy](https://codefrydev.in/Reinforcement/preliminary/numpy/) - [Probability & Statistics](https://codefrydev.in/Reinforcement/preliminary/probability/) - [Linear Algebra](https://codefrydev.in/Reinforcement/preliminary/linear-algebra/) - [Calculus](https://codefrydev.in/Reinforcement/preliminary/calculus/) - [RL Framework](https://codefrydev.in/Reinforcement/preliminary/rl-framework/) - [Tabular Methods](https://codefrydev.in/Reinforcement/preliminary/tabular-methods/) - [Value Functions and Bellman Equation](https://codefrydev.in/Reinforcement/preliminary/value-functions-bellman/) - [Function Approximation and Deep RL](https://codefrydev.in/Reinforcement/preliminary/function-approximation-deep-rl/) - [PyTorch Basics](https://codefrydev.in/Reinforcement/preliminary/pytorch-basics/) - [Final Self-Assessment](https://codefrydev.in/Reinforcement/preliminary/self-assessment/) ## Prerequisites — Tools & Libraries - [Python](https://codefrydev.in/Reinforcement/prerequisites/python/) - [NumPy](https://codefrydev.in/Reinforcement/prerequisites/numpy/) - [Pandas](https://codefrydev.in/Reinforcement/prerequisites/pandas/) - [Visualization & Plotting for RL](https://codefrydev.in/Reinforcement/prerequisites/visualization/) - [Matplotlib](https://codefrydev.in/Reinforcement/prerequisites/matplotlib/) - [PyTorch](https://codefrydev.in/Reinforcement/prerequisites/pytorch/) - [TensorFlow](https://codefrydev.in/Reinforcement/prerequisites/tensorflow/) - [OpenAI Gym / Gymnasium](https://codefrydev.in/Reinforcement/prerequisites/gym/) - [Other Libraries](https://codefrydev.in/Reinforcement/prerequisites/other-libraries/) ## Reinforcement learning curriculum — volumes and 100 chapters ### Volume 1: Mathematical Foundations - [Chapter 1: The Reinforcement Learning Framework](https://codefrydev.in/Reinforcement/curriculum/volume-01/chapter-01/) - [Chapter 2: Multi-Armed Bandits](https://codefrydev.in/Reinforcement/curriculum/volume-01/chapter-02/) - [Bandits: Optimistic Initial Values](https://codefrydev.in/Reinforcement/curriculum/volume-01/bandits-optimistic-initial-values/) - [Chapter 3: Markov Decision Processes (MDPs)](https://codefrydev.in/Reinforcement/curriculum/volume-01/chapter-03/) - [Bandits: UCB1](https://codefrydev.in/Reinforcement/curriculum/volume-01/bandits-ucb1/) - [Chapter 4: The Reward Hypothesis](https://codefrydev.in/Reinforcement/curriculum/volume-01/chapter-04/) - [Gridworld](https://codefrydev.in/Reinforcement/curriculum/volume-01/gridworld/) - [Bandits: Thompson Sampling](https://codefrydev.in/Reinforcement/curriculum/volume-01/bandits-thompson-sampling/) - [Chapter 5: Value Functions](https://codefrydev.in/Reinforcement/curriculum/volume-01/chapter-05/) - [Choosing Rewards](https://codefrydev.in/Reinforcement/curriculum/volume-01/choosing-rewards/) - [Bandits: Nonstationary](https://codefrydev.in/Reinforcement/curriculum/volume-01/bandits-nonstationary/) - [Chapter 6: The Bellman Equations](https://codefrydev.in/Reinforcement/curriculum/volume-01/chapter-06/) - [Bandits: Why don't we just use a library?](https://codefrydev.in/Reinforcement/curriculum/volume-01/bandits-why-not-library/) - [Chapter 7: Dynamic Programming — Policy Evaluation](https://codefrydev.in/Reinforcement/curriculum/volume-01/chapter-07/) - [Chapter 8: Dynamic Programming — Policy Iteration](https://codefrydev.in/Reinforcement/curriculum/volume-01/chapter-08/) - [Windy Gridworld](https://codefrydev.in/Reinforcement/curriculum/volume-01/windy-gridworld/) - [Chapter 9: Dynamic Programming — Value Iteration](https://codefrydev.in/Reinforcement/curriculum/volume-01/chapter-09/) - [Dynamic Programming: Gridworld in Code](https://codefrydev.in/Reinforcement/curriculum/volume-01/dp-gridworld-in-code/) - [Chapter 10: Limitations of Dynamic Programming](https://codefrydev.in/Reinforcement/curriculum/volume-01/chapter-10/) - [Volume 1 Drills — Mathematical Foundations](https://codefrydev.in/Reinforcement/curriculum/volume-01/drills/) - [Volume 1 Review & Bridge to Volume 2](https://codefrydev.in/Reinforcement/curriculum/volume-01/review-and-bridge/) ### Volume 2: Tabular Methods & Classic Algorithms - [Chapter 11: Monte Carlo Methods](https://codefrydev.in/Reinforcement/curriculum/volume-02/chapter-01/) - [Chapter 12: Temporal Difference (TD) Learning](https://codefrydev.in/Reinforcement/curriculum/volume-02/chapter-02/) - [Monte Carlo in Code](https://codefrydev.in/Reinforcement/curriculum/volume-02/monte-carlo-in-code/) - [Chapter 13: SARSA (On-Policy TD Control)](https://codefrydev.in/Reinforcement/curriculum/volume-02/chapter-03/) - [TD, SARSA, and Q-Learning in Code](https://codefrydev.in/Reinforcement/curriculum/volume-02/td-sarsa-q-in-code/) - [Chapter 14: Q-Learning (Off-Policy TD Control)](https://codefrydev.in/Reinforcement/curriculum/volume-02/chapter-04/) - [Chapter 15: Expected SARSA](https://codefrydev.in/Reinforcement/curriculum/volume-02/chapter-05/) - [Chapter 16: N-Step Bootstrapping](https://codefrydev.in/Reinforcement/curriculum/volume-02/chapter-06/) - [Chapter 17: Planning and Learning with Tabular Methods](https://codefrydev.in/Reinforcement/curriculum/volume-02/chapter-07/) - [Chapter 18: Custom Gym Environments (Part 1)](https://codefrydev.in/Reinforcement/curriculum/volume-02/chapter-08/) - [Chapter 19: Hyperparameter Tuning in Tabular RL](https://codefrydev.in/Reinforcement/curriculum/volume-02/chapter-09/) - [Chapter 20: The Limits of Tabular Methods](https://codefrydev.in/Reinforcement/curriculum/volume-02/chapter-10/) - [Volume 2 Drills — Tabular Model-Free Methods](https://codefrydev.in/Reinforcement/curriculum/volume-02/drills/) - [Volume 2 Review & Bridge to Volume 3](https://codefrydev.in/Reinforcement/curriculum/volume-02/review-and-bridge/) ### Volume 3: Value Function Approximation & Deep Q-Learning - [Chapter 21: Linear Function Approximation](https://codefrydev.in/Reinforcement/curriculum/volume-03/chapter-01/) - [Feature Engineering for Reinforcement Learning](https://codefrydev.in/Reinforcement/curriculum/volume-03/feature-engineering/) - [CartPole](https://codefrydev.in/Reinforcement/curriculum/volume-03/cartpole/) - [Chapter 22: Artificial Neural Networks for RL](https://codefrydev.in/Reinforcement/curriculum/volume-03/chapter-02/) - [Chapter 23: Deep Q-Networks (DQN)](https://codefrydev.in/Reinforcement/curriculum/volume-03/chapter-03/) - [Chapter 24: Experience Replay](https://codefrydev.in/Reinforcement/curriculum/volume-03/chapter-04/) - [Chapter 25: Target Networks](https://codefrydev.in/Reinforcement/curriculum/volume-03/chapter-05/) - [Chapter 26: Double DQN (DDQN)](https://codefrydev.in/Reinforcement/curriculum/volume-03/chapter-06/) - [Chapter 27: Dueling DQN](https://codefrydev.in/Reinforcement/curriculum/volume-03/chapter-07/) - [Chapter 28: Prioritized Experience Replay (PER)](https://codefrydev.in/Reinforcement/curriculum/volume-03/chapter-08/) - [Chapter 29: Noisy Networks for Exploration](https://codefrydev.in/Reinforcement/curriculum/volume-03/chapter-09/) - [Chapter 30: Rainbow DQN](https://codefrydev.in/Reinforcement/curriculum/volume-03/chapter-10/) - [Volume 3 Drills — Function Approximation & DQN](https://codefrydev.in/Reinforcement/curriculum/volume-03/drills/) - [Volume 3 Review & Bridge to Volume 4](https://codefrydev.in/Reinforcement/curriculum/volume-03/review-and-bridge/) ### Volume 4: Policy Gradients - [Chapter 31: Introduction to Policy-Based Methods](https://codefrydev.in/Reinforcement/curriculum/volume-04/chapter-01/) - [Chapter 32: The Policy Objective Function](https://codefrydev.in/Reinforcement/curriculum/volume-04/chapter-02/) - [Chapter 33: The REINFORCE Algorithm](https://codefrydev.in/Reinforcement/curriculum/volume-04/chapter-03/) - [Chapter 34: Reducing Variance in Policy Gradients](https://codefrydev.in/Reinforcement/curriculum/volume-04/chapter-04/) - [Chapter 35: Actor-Critic Architectures](https://codefrydev.in/Reinforcement/curriculum/volume-04/chapter-05/) - [Chapter 36: Advantage Actor-Critic (A2C)](https://codefrydev.in/Reinforcement/curriculum/volume-04/chapter-06/) - [Chapter 37: Asynchronous Advantage Actor-Critic (A3C)](https://codefrydev.in/Reinforcement/curriculum/volume-04/chapter-07/) - [Chapter 38: Continuous Action Spaces](https://codefrydev.in/Reinforcement/curriculum/volume-04/chapter-08/) - [Chapter 39: Deep Deterministic Policy Gradient (DDPG)](https://codefrydev.in/Reinforcement/curriculum/volume-04/chapter-09/) - [Chapter 40: Twin Delayed DDPG (TD3)](https://codefrydev.in/Reinforcement/curriculum/volume-04/chapter-10/) - [Volume 4 Review & Bridge to Volume 5](https://codefrydev.in/Reinforcement/curriculum/volume-04/review-and-bridge/) ### Volume 5: Advanced Policy Optimization - [Chapter 41: The Problem with Standard Policy Gradients](https://codefrydev.in/Reinforcement/curriculum/volume-05/chapter-01/) - [Chapter 42: Trust Region Policy Optimization (TRPO)](https://codefrydev.in/Reinforcement/curriculum/volume-05/chapter-02/) - [Chapter 43: Proximal Policy Optimization (PPO): Intuition](https://codefrydev.in/Reinforcement/curriculum/volume-05/chapter-03/) - [Chapter 44: PPO: Implementation Details](https://codefrydev.in/Reinforcement/curriculum/volume-05/chapter-04/) - [Chapter 45: Coding PPO from Scratch](https://codefrydev.in/Reinforcement/curriculum/volume-05/chapter-05/) - [Chapter 46: Maximum Entropy RL](https://codefrydev.in/Reinforcement/curriculum/volume-05/chapter-06/) - [Chapter 47: Soft Actor-Critic (SAC)](https://codefrydev.in/Reinforcement/curriculum/volume-05/chapter-07/) - [Chapter 48: SAC vs. PPO](https://codefrydev.in/Reinforcement/curriculum/volume-05/chapter-08/) - [Chapter 49: Custom Gym Environments (Part 2)](https://codefrydev.in/Reinforcement/curriculum/volume-05/chapter-09/) - [Chapter 50: Advanced Hyperparameter Tuning](https://codefrydev.in/Reinforcement/curriculum/volume-05/chapter-10/) - [Volume 5 Review & Bridge to Volume 6](https://codefrydev.in/Reinforcement/curriculum/volume-05/review-and-bridge/) ### Volume 6: Model-Based RL & Planning - [Chapter 51: Model-Free vs. Model-Based RL](https://codefrydev.in/Reinforcement/curriculum/volume-06/chapter-01/) - [Chapter 52: Learning World Models](https://codefrydev.in/Reinforcement/curriculum/volume-06/chapter-02/) - [Chapter 53: Planning with Known Models](https://codefrydev.in/Reinforcement/curriculum/volume-06/chapter-03/) - [Chapter 54: Monte Carlo Tree Search (MCTS)](https://codefrydev.in/Reinforcement/curriculum/volume-06/chapter-04/) - [Chapter 55: AlphaZero Architecture](https://codefrydev.in/Reinforcement/curriculum/volume-06/chapter-05/) - [Chapter 56: MuZero Intuition](https://codefrydev.in/Reinforcement/curriculum/volume-06/chapter-06/) - [Chapter 57: Dreamer and Latent Imagination](https://codefrydev.in/Reinforcement/curriculum/volume-06/chapter-07/) - [Chapter 58: Model-Based Policy Optimization (MBPO)](https://codefrydev.in/Reinforcement/curriculum/volume-06/chapter-08/) - [Chapter 59: Probabilistic Ensembles with Trajectory Sampling (PETS)](https://codefrydev.in/Reinforcement/curriculum/volume-06/chapter-09/) - [Chapter 60: Visualizing Model-Based Rollouts](https://codefrydev.in/Reinforcement/curriculum/volume-06/chapter-10/) - [Volume 6 Review & Bridge to Volume 7](https://codefrydev.in/Reinforcement/curriculum/volume-06/review-and-bridge/) ### Volume 7: Exploration and Meta-Learning - [Chapter 61: The Hard Exploration Problem](https://codefrydev.in/Reinforcement/curriculum/volume-07/chapter-01/) - [Chapter 62: Intrinsic Motivation](https://codefrydev.in/Reinforcement/curriculum/volume-07/chapter-02/) - [Chapter 63: Curiosity-Driven Exploration (ICM)](https://codefrydev.in/Reinforcement/curriculum/volume-07/chapter-03/) - [Chapter 64: Random Network Distillation (RND)](https://codefrydev.in/Reinforcement/curriculum/volume-07/chapter-04/) - [Chapter 65: Count-Based Exploration](https://codefrydev.in/Reinforcement/curriculum/volume-07/chapter-05/) - [Chapter 66: Go-Explore Algorithm](https://codefrydev.in/Reinforcement/curriculum/volume-07/chapter-06/) - [Chapter 67: Meta-Learning (Learning to Learn)](https://codefrydev.in/Reinforcement/curriculum/volume-07/chapter-07/) - [Chapter 68: Model-Agnostic Meta-Learning (MAML) in RL](https://codefrydev.in/Reinforcement/curriculum/volume-07/chapter-08/) - [Chapter 69: RL² (Reinforcement Learning as an RNN)](https://codefrydev.in/Reinforcement/curriculum/volume-07/chapter-09/) - [Chapter 70: Unsupervised Environment Design](https://codefrydev.in/Reinforcement/curriculum/volume-07/chapter-10/) - [Volume 7 Review & Bridge to Volume 8](https://codefrydev.in/Reinforcement/curriculum/volume-07/review-and-bridge/) ### Volume 8: Offline RL & Imitation Learning - [Chapter 71: The Offline RL Problem](https://codefrydev.in/Reinforcement/curriculum/volume-08/chapter-01/) - [Chapter 72: Conservative Q-Learning (CQL)](https://codefrydev.in/Reinforcement/curriculum/volume-08/chapter-02/) - [Chapter 73: Decision Transformers](https://codefrydev.in/Reinforcement/curriculum/volume-08/chapter-03/) - [Chapter 74: Introduction to Imitation Learning](https://codefrydev.in/Reinforcement/curriculum/volume-08/chapter-04/) - [Chapter 75: Limitations of Behavioral Cloning](https://codefrydev.in/Reinforcement/curriculum/volume-08/chapter-05/) - [Chapter 76: Inverse Reinforcement Learning (IRL)](https://codefrydev.in/Reinforcement/curriculum/volume-08/chapter-06/) - [Chapter 77: Generative Adversarial Imitation Learning (GAIL)](https://codefrydev.in/Reinforcement/curriculum/volume-08/chapter-07/) - [Chapter 78: Adversarial Motion Priors (AMP)](https://codefrydev.in/Reinforcement/curriculum/volume-08/chapter-08/) - [Chapter 79: Offline-to-Online Finetuning](https://codefrydev.in/Reinforcement/curriculum/volume-08/chapter-09/) - [Chapter 80: RL from Human Feedback (RLHF) Basics](https://codefrydev.in/Reinforcement/curriculum/volume-08/chapter-10/) - [Volume 8 Review & Bridge to Volume 9](https://codefrydev.in/Reinforcement/curriculum/volume-08/review-and-bridge/) ### Volume 9: Multi-Agent RL (MARL) - [Chapter 81: Multi-Agent Fundamentals](https://codefrydev.in/Reinforcement/curriculum/volume-09/chapter-01/) - [Chapter 82: Game Theory Basics for RL](https://codefrydev.in/Reinforcement/curriculum/volume-09/chapter-02/) - [Chapter 83: Independent Q-Learning (IQL)](https://codefrydev.in/Reinforcement/curriculum/volume-09/chapter-03/) - [Chapter 84: Centralized Training, Decentralized Execution (CTDE)](https://codefrydev.in/Reinforcement/curriculum/volume-09/chapter-04/) - [Chapter 85: Multi-Agent DDPG (MADDPG)](https://codefrydev.in/Reinforcement/curriculum/volume-09/chapter-05/) - [Chapter 86: Value Decomposition Networks (VDN)](https://codefrydev.in/Reinforcement/curriculum/volume-09/chapter-06/) - [Chapter 87: QMIX Algorithm](https://codefrydev.in/Reinforcement/curriculum/volume-09/chapter-07/) - [Chapter 88: Multi-Agent PPO (MAPPO)](https://codefrydev.in/Reinforcement/curriculum/volume-09/chapter-08/) - [Chapter 89: Self-Play and League Training](https://codefrydev.in/Reinforcement/curriculum/volume-09/chapter-09/) - [Chapter 90: Communication in MARL](https://codefrydev.in/Reinforcement/curriculum/volume-09/chapter-10/) - [Volume 9 Review & Bridge to Volume 10](https://codefrydev.in/Reinforcement/curriculum/volume-09/review-and-bridge/) ### Volume 10: Real-World RL, Safety & Large Language Models - [Chapter 91: RL in Robotics](https://codefrydev.in/Reinforcement/curriculum/volume-10/chapter-01/) - [Chapter 92: Safe Reinforcement Learning](https://codefrydev.in/Reinforcement/curriculum/volume-10/chapter-02/) - [Chapter 93: RL for Algorithmic Trading](https://codefrydev.in/Reinforcement/curriculum/volume-10/chapter-03/) - [Chapter 94: RL in Recommender Systems](https://codefrydev.in/Reinforcement/curriculum/volume-10/chapter-04/) - [Chapter 95: Training Large Language Models with PPO](https://codefrydev.in/Reinforcement/curriculum/volume-10/chapter-05/) - [Chapter 96: Implementing RLHF in NLP](https://codefrydev.in/Reinforcement/curriculum/volume-10/chapter-06/) - [Chapter 97: Direct Preference Optimization (DPO)](https://codefrydev.in/Reinforcement/curriculum/volume-10/chapter-07/) - [Chapter 98: Evaluating RL Agents](https://codefrydev.in/Reinforcement/curriculum/volume-10/chapter-08/) - [Chapter 99: Debugging RL Code](https://codefrydev.in/Reinforcement/curriculum/volume-10/chapter-09/) - [Chapter 100: The Future of Reinforcement Learning](https://codefrydev.in/Reinforcement/curriculum/volume-10/chapter-10/)