Learning objectives

  • Explain the KNN algorithm and why it is called “lazy learning.”
  • Compute Euclidean distance between two points by hand and in NumPy.
  • Implement KNN classification from scratch and observe the effect of varying K.

Concept and real-world motivation

To classify a new point, look at the K closest points in the training set and take a majority vote. There is no explicit training phase — the model is literally just the stored training data. This makes KNN a lazy learner: all computation happens at prediction time. It works surprisingly well when: (1) similar inputs have similar outputs, and (2) the dataset is not too large.

KNN naturally handles non-linear decision boundaries because it adapts locally to the data. The critical hyperparameter is K: K=1 memorizes every training point (high variance, low bias), large K smooths the boundary (low variance, high bias). In RL, memory-based methods like episodic Q-learning store past experience and look up similar states at decision time — the same lazy-learning idea applied to value functions.

Illustration: KNN accuracy peaks at a middle value of K, then degrades as K grows too large.

Exercise: Implement KNN classification from scratch on 8 training points in 2D. Classify one test point using K=3.

Try it — edit and run (Shift+Enter)

Professor’s hints

  • Euclidean distance: \(d(p,q) = \sqrt{\sum_i (p_i - q_i)^2}\). In NumPy: np.sqrt(np.sum((X_train - test_point)**2, axis=1)) computes all 8 distances in one line.
  • np.argsort(distances) returns indices that sort the distances from smallest to largest. Take the first K of those.
  • Majority vote: int(np.mean(k_nearest_labels) >= 0.5) works for binary labels. More generally, use np.bincount(k_nearest_labels).argmax().

Common pitfalls

  • Forgetting to normalize features: If one feature ranges 0–1000 and another 0–1, distance is dominated by the first. Always scale features before KNN.
  • Using wrong axis in argsort: np.argsort(distances) sorts a 1D array correctly. If you accidentally keep axis=1, you sort within rows, not across all distances.
  • K larger than training set: K cannot exceed the number of training points. Add a check: K = min(K, len(X_train)).
Worked solution
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import numpy as np

X_train = np.array([[1,2],[1.5,1.8],[5,8],[8,8],[1,0.6],[9,11],[8,2],[10,2]], dtype=float)
y_train = np.array([0, 0, 1, 1, 0, 1, 1, 1])
test_point = np.array([5, 5], dtype=float)
K = 3

distances     = np.sqrt(np.sum((X_train - test_point)**2, axis=1))
sorted_idx    = np.argsort(distances)
k_nearest_labels = y_train[sorted_idx[:K]]
prediction    = int(np.mean(k_nearest_labels) >= 0.5)
print(f'Predicted class: {prediction}')  # 1

Extra practice

  1. Warm-up: Compute the Euclidean distance from point A=(1,2) to points B=(4,6) and C=(1,3) by hand. Which is closer?
Try it — edit and run (Shift+Enter)
  1. Coding: Wrap the KNN logic into a function knn_predict(X_train, y_train, test_point, K) and test it on all 8 training points (classifying each using the remaining 7 as training data — leave-one-out).
  2. Challenge: Generate a 2D dataset with 100 points from two Gaussian clusters. Run KNN with K=1, 5, and 10 and compute test accuracy for each. Plot how accuracy changes with K.
  3. Variant: Try K=1, K=5, K=10 on the 8-point training set from the main exercise with test point (2, 2). Record how the predicted label changes. Explain why K=1 is most sensitive to noise.
  4. Debug: The code below has a bug — argsort result is not sliced correctly so wrong neighbors are selected. Find and fix it.
Try it — edit and run (Shift+Enter)
  1. Conceptual: Why does KNN get slower as the training set grows? What is the time complexity of one prediction given N training points and D features?
  2. Recall: From memory, describe the KNN prediction algorithm in 4 steps, starting from “given a new test point.”