Learning objectives
- Build a complete neural network pipeline from data loading to evaluation using only NumPy
- Implement forward pass, cross-entropy loss, backpropagation, and SGD in sequence
- Track and interpret a training loss curve
- Connect this pipeline to the DQN training pattern
Concept and real-world motivation
This mini-project combines everything from the DL Foundations section. You will build a 2-layer MLP to classify handwritten digits — the same pipeline used in DQN: input → hidden layers → output. The input is a flattened image (pixel values), the hidden layers extract features, and the output layer predicts a class (or in DQN, a Q-value per action).
We use sklearn’s digits dataset — 1797 samples of 8×8 = 64-pixel images of digits 0–9. We take the first 100 samples to keep computation fast in the browser.
Step 1 — Prepare data
Step 2 — Initialize the MLP
Architecture: 64 → 32 → 10 (input features → hidden → output classes)
Step 3 — Training loop
Step 4 — Plot loss curve
Step 5 — Evaluate on test set
Debug exercise: Fix the softmax that doesn’t sum to 1 (missing normalization):
Professor’s hints
- On only 80 training samples, the network can memorize the data. Watch the loss curve — if it goes to near-zero, the model is overfitting on this tiny dataset.
- With
lr=0.1on 200 epochs you should see clear learning. If loss barely moves, trylr=0.5. - The test accuracy with 100 samples and simple MLP will be modest (~50–70%) — this is expected. With all 1797 samples, it reaches ~95%.
Common pitfalls
- Running the evaluation cell without first running the training cell (weights won’t be trained).
- Using the wrong axis in softmax: use
axis=1for batches (rows are samples), notaxis=0.
Worked solution comparison with PyTorch
Extra practice
Warm-up: Run only Step 1. Print the pixel values of the first training sample. Reshape it to 8×8 and print.
Coding: Add L2 regularization (lambda=0.01) to the training loop in Step 3. Does the test accuracy improve?
Challenge: Scale to all 1797 samples. Add a third hidden layer (64→128→64→10). What test accuracy do you achieve?
Variant: Replace SGD with a hand-coded Adam optimizer in the training loop. Compare convergence speed.
Debug: Modify Step 3 to introduce a bug: divide by
n_classesinstead oflen(Xb)in the gradient. Observe how training is affected.Conceptual: How does this digits classifier pipeline compare to DQN? Map: input → state, hidden layers → feature extraction, output → Q-values/actions.
Recall: In 3 steps, describe the full training pipeline you implemented from raw pixels to accuracy score.