Learning objectives

  • Distinguish traditional rule-based programming from machine learning.
  • Name and describe the three main types of ML: supervised, unsupervised, and reinforcement learning.
  • Classify a new problem as supervised, unsupervised, or reinforcement learning given its description.

Concept and real-world motivation

In traditional programming, a human writes explicit rules: if email contains "free money" → spam. This works for simple cases but breaks the moment the world gets complicated. A spam filter based on hand-written rules fails against new tricks; a chess program based on if/else trees cannot compete with millions of possible board positions. Machine learning takes a different approach: instead of programming rules, we show the machine examples and let it figure out the patterns.

The three types of ML differ in what kind of signal the machine learns from. In supervised learning, every example comes with a correct answer (label) — like a dataset of emails labeled “spam” or “not spam.” In unsupervised learning, there are no labels — the machine finds hidden structure on its own, like grouping customers by purchase behavior. In reinforcement learning, there are no labels at all, only a reward signal: the agent tries things in an environment and learns which actions lead to more reward. This is exactly how we will train RL agents in the rest of this course — RL is a third type of ML, and everything you learn about supervised learning here will reappear inside RL algorithms.

Illustration: The bar chart below shows the rough proportion of real-world ML problem types encountered in industry (supervised problems are by far the most common, which is why we spend the most time on them).

Exercise: Eight problems are listed below. For each one, classify it as supervised, unsupervised, or reinforcement. Fill in the my_answers list and run the cell to check your work.

Try it — edit and run (Shift+Enter)

Professor’s hints

  • Ask yourself: “Is there a correct answer (label) for each example?” If yes → supervised.
  • Ask yourself: “Is an agent taking actions and receiving rewards?” If yes → reinforcement learning.
  • If neither — the algorithm is finding structure without guidance → unsupervised.
  • Anomaly detection (problem 5) has no “here is the anomaly” label in the training data — the model learns what “normal” looks like and flags departures.

Common pitfalls

  • Confusing RL with supervised learning: In RL, the agent does not receive the “correct action” for each state — it only receives a reward after a sequence of actions. This is a fundamentally different signal from a labeled dataset.
  • Thinking unsupervised = no learning: Unsupervised methods learn rich structure (clusters, dimensions, densities) — they just do so without human-provided labels.
  • Assuming RL requires a game: RL applies to any sequential decision-making problem: robotics, recommendation systems, resource scheduling, and more.
Worked solution

Here are the correct classifications and the reasoning:

  1. House price predictionsupervised — Each house has a known sale price (label). The model learns the mapping features → price.
  2. Customer groupingunsupervised — No pre-labeled clusters exist. K-means or similar algorithms find groups from the data itself.
  3. Robot walkingreinforcement — The robot receives reward for staying upright and penalized for falling. No labeled “correct joint angles” exist.
  4. Spam classificationsupervised — Emails are labeled spam/not-spam. The model learns from those labels.
  5. Bank anomaly detectionunsupervised — Normal transactions are not individually labeled. The model learns the distribution of normal and flags deviations.
  6. Chessreinforcement — The agent plays games and receives +1 for winning, 0 for draw, -1 for losing. It learns by trial and error.
  7. Weather predictionsupervised — Historical records pair (today’s features) → (tomorrow’s temperature). Each training example has a label.
  8. News clusteringunsupervised — Articles are not pre-categorized. Topic models or clustering algorithms find the groups.
1
2
3
correct = ['supervised','unsupervised','reinforcement',
           'supervised','unsupervised','reinforcement',
           'supervised','unsupervised']

Extra practice

  1. Warm-up: Write a simple if/else “classifier” that predicts spam based on keywords. Then try it on a new email it gets wrong. This shows the limit of rule-based approaches.
Try it — edit and run (Shift+Enter)
  1. Coding: Add two more test emails to the list above that your rule-based classifier gets wrong. This demonstrates why data-driven ML is needed.
  2. Challenge: A recommendation system suggests movies based on watch history. Is it supervised, unsupervised, or reinforcement learning? Argue for more than one interpretation — under what framing is it each type?
  3. Variant: Suppose you have a dataset of customer transactions and you know which transactions were fraudulent (labeled by a human review team). How does this change anomaly detection from unsupervised to supervised? What are the pros and cons of each approach?
  4. Debug: The code below tries to check answers but has a bug — it always prints “OK” even for wrong answers. Find and fix it.
Try it — edit and run (Shift+Enter)
  1. Conceptual: Explain in one paragraph why reinforcement learning is harder than supervised learning. What makes the reward signal a less informative teaching signal than a labeled dataset?
  2. Recall: Name the three types of machine learning and give one real-world example of each from memory. Write your answer before looking at the page.