Learning objectives

  • Build a complete neural network pipeline from data loading to evaluation using only NumPy
  • Implement forward pass, cross-entropy loss, backpropagation, and SGD in sequence
  • Track and interpret a training loss curve
  • Connect this pipeline to the DQN training pattern

Concept and real-world motivation

This mini-project combines everything from the DL Foundations section. You will build a 2-layer MLP to classify handwritten digits — the same pipeline used in DQN: input → hidden layers → output. The input is a flattened image (pixel values), the hidden layers extract features, and the output layer predicts a class (or in DQN, a Q-value per action).

We use sklearn’s digits dataset — 1797 samples of 8×8 = 64-pixel images of digits 0–9. We take the first 100 samples to keep computation fast in the browser.


Step 1 — Prepare data

Try it — edit and run (Shift+Enter)

Step 2 — Initialize the MLP

Architecture: 64 → 32 → 10 (input features → hidden → output classes)

Try it — edit and run (Shift+Enter)

Step 3 — Training loop

Try it — edit and run (Shift+Enter)

Step 4 — Plot loss curve

Try it — edit and run (Shift+Enter)

Step 5 — Evaluate on test set

Try it — edit and run (Shift+Enter)

Debug exercise: Fix the softmax that doesn’t sum to 1 (missing normalization):

Try it — edit and run (Shift+Enter)

Professor’s hints

  • On only 80 training samples, the network can memorize the data. Watch the loss curve — if it goes to near-zero, the model is overfitting on this tiny dataset.
  • With lr=0.1 on 200 epochs you should see clear learning. If loss barely moves, try lr=0.5.
  • The test accuracy with 100 samples and simple MLP will be modest (~50–70%) — this is expected. With all 1797 samples, it reaches ~95%.

Common pitfalls

  • Running the evaluation cell without first running the training cell (weights won’t be trained).
  • Using the wrong axis in softmax: use axis=1 for batches (rows are samples), not axis=0.
Worked solution comparison with PyTorch
For a PyTorch comparison, use the local notebook:
DL Mini-Project in PyTorch (run locally)

Extra practice

  1. Warm-up: Run only Step 1. Print the pixel values of the first training sample. Reshape it to 8×8 and print.

  2. Coding: Add L2 regularization (lambda=0.01) to the training loop in Step 3. Does the test accuracy improve?

  3. Challenge: Scale to all 1797 samples. Add a third hidden layer (64→128→64→10). What test accuracy do you achieve?

  4. Variant: Replace SGD with a hand-coded Adam optimizer in the training loop. Compare convergence speed.

  5. Debug: Modify Step 3 to introduce a bug: divide by n_classes instead of len(Xb) in the gradient. Observe how training is affected.

  6. Conceptual: How does this digits classifier pipeline compare to DQN? Map: input → state, hidden layers → feature extraction, output → Q-values/actions.

  7. Recall: In 3 steps, describe the full training pipeline you implemented from raw pixels to accuracy score.