Deep Learning Foundations

Neural networks, backpropagation, and training — the deep learning foundations behind DQN, policy gradients, and modern RL.

Overall Progress 0%

How the biological neuron — dendrites, soma, axon — maps onto the artificial neuron with inputs, weights, bias, and activation.

Go to Biological Inspiration: From Brain Neurons to Artificial Neurons →

The perceptron learning rule, training on AND and OR gates, and why XOR exposes the fundamental limitation of single-layer networks.

Go to The Perceptron: Learning from Mistakes →

ReLU, sigmoid, tanh, and softmax — what they compute, when to use each, and why non-linearity is essential for deep networks.

Go to Activation Functions: Adding Non-Linearity →

MLP architecture, parameter counting, and how stacking non-linear layers allows networks to solve XOR and approximate any function.

Go to Multi-Layer Perceptrons: Stacking Layers to Break Linearity →

Layer-by-layer forward pass through an MLP — computing pre-activations, applying activations, and understanding intermediate representations.

Go to Forward Propagation: Computing the Network Output →

MSE for regression, cross-entropy for classification, and the TD error loss in DQN — how loss functions guide neural network training.

Go to Loss Functions: Measuring How Wrong the Network Is →

The chain rule applied backwards through a neural network — computing gradients for every weight and verifying them with numerical finite differences.

Go to Backpropagation: Teaching Networks by Propagating Errors →

Understand SGD, Momentum, and Adam optimizers from scratch. Implement and compare them in NumPy.

Go to Optimizers: SGD, Momentum, and Adam →

Build a full training loop in NumPy: batches, epochs, forward pass, backprop, and weight updates.

Go to The Training Loop →

Understand overfitting and apply L2 regularization and dropout to prevent it in NumPy.

Go to Regularization and Overfitting →

Learn convolution and pooling from scratch in NumPy. See how Atari DQN uses CNNs to process raw pixels.

Go to CNN Basics: Convolutions and Pooling →

Bridge NumPy implementations to PyTorch. Build QNetwork and PolicyNetwork with nn.Module for RL.

Go to PyTorch: Building Neural Networks with nn.Module →

Build a 2-layer MLP to classify handwritten digits using only NumPy. Full pipeline: data, init, training, evaluation.

Go to DL Mini-Project: Digits Classifier in NumPy →

15 drill problems covering neural networks, forward pass, backpropagation, optimizers, and training.

Go to DL Foundations Drills →

Review deep learning and see why RL needs neural networks — the bridge to DQN and policy gradients.

Go to DL Foundations Review & Bridge to RL →

What this section covers

Deep learning is the technology that transformed reinforcement learning from a research curiosity into a practical tool for solving hard problems. Before AlphaGo, DQN, and PPO, RL was limited to tiny, hand-crafted state spaces. Deep neural networks changed everything by serving as powerful function approximators — able to map raw pixels to values, states to action probabilities, and observations to policies.

This section builds deep learning from the ground up, starting with the biological inspiration for artificial neurons and progressing through multi-layer networks, forward propagation, loss functions, and backpropagation. Every concept is introduced with explicit connections to RL algorithms so you always know why you are learning it.

Topics covered:

From biological neurons to artificial neurons: inputs, weights, bias, activation
The perceptron: the simplest learning rule, AND gate, XOR limitations
Activation functions: ReLU, sigmoid, tanh, softmax — when and why
Multi-layer perceptrons: architecture, parameter counting, solving XOR
Forward propagation: layer-by-layer computation, intermediate activations
Loss functions: MSE for regression, cross-entropy for classification
Backpropagation: chain rule, computing gradients, updating weights
Gradient descent for neural networks: learning rate, momentum, Adam
Training a neural network: mini-batches, epochs, training loop
Regularization: dropout, weight decay, early stopping
Convolutional neural networks: filters, pooling, feature maps
Batch normalization and residual connections
The complete DQN network: putting it all together

Why deep learning matters for RL

DQN is just Q-learning where the Q-function is a neural network.

That single sentence captures everything. In tabular Q-learning, we store a table Q[s, a] with one entry per (state, action) pair. This works for toy problems with a handful of states. For Atari games with 210×160 pixels, the state space is astronomically large — a table is impossible. The solution: replace the table with a neural network that takes the state as input and outputs Q-values for all actions.

DL concept	Where it reappears in RL
Artificial neuron	Building block of all value and policy networks
Forward propagation	Computing Q(s,a) or π(a\|s) during inference
Loss function (MSE)	DQN loss: \((r + \gamma \max_{a’} Q(s’, a’) - Q(s,a))^2\)
Loss function (cross-entropy)	Policy gradient loss
Backpropagation	How Q-networks and policy networks are trained
ReLU activations	Standard hidden-layer activation in DQN, A3C, PPO
Softmax	Action probability distribution in policy networks
Batch normalization	Stabilizing training in deep RL
Convolutional layers	Processing raw pixel observations in Atari DQN
Gradient descent / Adam	Optimizing all modern RL networks

Policy gradient methods go further: instead of approximating a value function, they parameterize the policy itself as a neural network π(a|s; θ) and optimize the expected return directly using gradient ascent. Actor–critic methods combine both: a policy network (actor) and a value network (critic), both trained with backpropagation.

Pedagogical approach: NumPy first

We implement everything in NumPy first. PyTorch is introduced via linked notebooks.

This is intentional. Implementing a neural network forward pass in NumPy — manually computing matrix multiplications, writing the ReLU function, computing the softmax — gives you a deep understanding of what the framework does for you. When you later call torch.nn.Linear or loss.backward(), you will know exactly what is happening inside.

The in-browser pyrepl exercises use NumPy exclusively because the browser environment (Pyodide) does not support PyTorch. Every concept is fully implementable in NumPy, and the implementations here are pedagogically superior to framework code for learning purposes.

The linked JupyterLite notebooks (see each page) extend the exercises and transition to PyTorch once the concepts are solid.

#	Page	Topic
1	Biological Inspiration	Brain neurons → artificial neurons
2	The Perceptron	Perceptron learning rule, AND, XOR limits
3	Activation Functions	ReLU, sigmoid, tanh, softmax
4	Multi-Layer Perceptrons	Architecture, parameter counting, XOR solved
5	Forward Propagation	Layer-by-layer computation, batch forward pass
6	Loss Functions	MSE, cross-entropy, loss landscape
7	Backpropagation	Chain rule, gradients, numerical verification
8	Gradient Descent for NNs	Learning rate, momentum, Adam
9	Training Loop	Mini-batches, epochs, monitoring
10	Regularization	Dropout, weight decay, early stopping
11	Convolutional Neural Networks	Filters, pooling, feature maps
12	Batch Norm and Residuals	Normalization, skip connections
13	The DQN Network	Putting it all together for Atari

Quick-start guide

Complete pages in order. Each page builds on the previous one. The concepts are cumulative.
Do every pyrepl exercise. They run in your browser — no setup needed. The struggle of implementing in NumPy is where the understanding happens.
Check worked solutions only after a genuine attempt.
Use the extra practice items. Debug exercises (item 5) are especially valuable — recognizing broken code trains the same skill as writing correct code.
Open the JupyterLite notebooks for extended practice and PyTorch equivalents.

Estimated time: 2–4 hours per page. The full section takes approximately 30–50 hours.

Assessment checkpoints

After page 3 — Checkpoint A: Neurons and Activations — Can you implement a neuron and all four activations from scratch in NumPy?
After page 6 — Checkpoint B: Forward Pass and Loss — Can you implement a full forward pass and compute MSE and cross-entropy?
After page 9 — Checkpoint C: Backprop and Training — Can you implement backpropagation and a training loop from scratch?
After page 13 — Checkpoint D: DQN Architecture — Can you describe the DQN network architecture and explain why each component is needed?

Deep Learning Foundations

Biological Inspiration: From Brain Neurons to Artificial Neurons

The Perceptron: Learning from Mistakes

Activation Functions: Adding Non-Linearity

Multi-Layer Perceptrons: Stacking Layers to Break Linearity

Forward Propagation: Computing the Network Output

Loss Functions: Measuring How Wrong the Network Is

Backpropagation: Teaching Networks by Propagating Errors

Optimizers: SGD, Momentum, and Adam

The Training Loop

Regularization and Overfitting

CNN Basics: Convolutions and Pooling

PyTorch: Building Neural Networks with nn.Module

DL Mini-Project: Digits Classifier in NumPy

DL Foundations Drills

DL Foundations Review & Bridge to RL

What this section covers

Why deep learning matters for RL

Pedagogical approach: NumPy first

Table of contents

Quick-start guide

Assessment checkpoints