Learning objectives
- Recall key ML Foundations concepts and their RL connections.
- Demonstrate that logistic regression cannot solve XOR (non-linearly separable data).
- Articulate why neural networks are the natural next step after linear models.
ML Foundations Recap Quiz
Five questions. Attempt each before revealing the answer.
Q1. What is the difference between classification and regression?
Q1 answer
Q2. What does gradient descent minimize? Write the update rule.
Q2 answer
Q3. When does a linear model fail?
Q3 answer
Q4. Why do we split data into train and test sets?
Q4 answer
Q5. In 3 sentences, explain what cross-validation is.
Q5 answer
What Changes in Deep Learning
Linear models are powerful — but limited. Here is what changes when we move to neural networks.
| Linear / Logistic Regression | Neural Networks | |
|---|---|---|
| Can model non-linear boundaries? | No | Yes |
| Number of parameters | Small (one per feature) | Large (millions possible) |
| Training method | Gradient descent | Gradient descent + backpropagation |
| Interpretability | High (inspect weights directly) | Low (black box) |
| Good for RL value functions? | Only with hand-crafted features | Yes — can use raw states |
| Risk of overfitting | Low | High (needs regularization) |
The key insight: the training algorithm is the same. Neural networks use gradient descent too — but the gradient flows through multiple layers via backpropagation (the chain rule applied recursively). Everything you learned about loss functions, learning rates, overfitting, and evaluation applies directly.
Bridge Exercise
The XOR problem: Logistic regression fails on data that is not linearly separable. XOR is the canonical example — no straight line can separate the two classes.
Why XOR is unsolvable by logistic regression
Logistic regression draws a single linear decision boundary (a line in 2D). For XOR, the four points (0,0)→0, (0,1)→1, (1,0)→1, (1,1)→0 cannot be separated by any line — the “1” class and “0” class alternate in a checkerboard pattern. Logistic regression is stuck at 50% accuracy regardless of training.
A neural network with one hidden layer solves XOR by learning a non-linear feature transformation first, then classifying in the transformed space. This is the core motivation for deep learning.
| |
Ready for Deep Learning?
Check off each item honestly before moving on:
- I can implement MSE, compute its gradient, and take one gradient descent step.
- I understand train/test split and why evaluating on training data is wrong.
- I know what sigmoid does and when to use cross-entropy loss.
- I built a full sklearn pipeline and compared multiple models.
- I understand why linear models are limited — and what XOR illustrates.
- I can explain K-fold cross-validation and the bias-variance tradeoff.
- I implemented KNN and K-Means from scratch in NumPy.
If you checked all 7: You are ready.
If you missed any: Revisit the relevant page before continuing. The next section (Deep Learning Foundations) builds directly on all of these.
Next: DL Foundations