Learning objectives
- Explain the KNN algorithm and why it is called “lazy learning.”
- Compute Euclidean distance between two points by hand and in NumPy.
- Implement KNN classification from scratch and observe the effect of varying K.
Concept and real-world motivation
To classify a new point, look at the K closest points in the training set and take a majority vote. There is no explicit training phase — the model is literally just the stored training data. This makes KNN a lazy learner: all computation happens at prediction time. It works surprisingly well when: (1) similar inputs have similar outputs, and (2) the dataset is not too large.
KNN naturally handles non-linear decision boundaries because it adapts locally to the data. The critical hyperparameter is K: K=1 memorizes every training point (high variance, low bias), large K smooths the boundary (low variance, high bias). In RL, memory-based methods like episodic Q-learning store past experience and look up similar states at decision time — the same lazy-learning idea applied to value functions.
Illustration: KNN accuracy peaks at a middle value of K, then degrades as K grows too large.
Exercise: Implement KNN classification from scratch on 8 training points in 2D. Classify one test point using K=3.
Professor’s hints
- Euclidean distance: \(d(p,q) = \sqrt{\sum_i (p_i - q_i)^2}\). In NumPy:
np.sqrt(np.sum((X_train - test_point)**2, axis=1))computes all 8 distances in one line. np.argsort(distances)returns indices that sort the distances from smallest to largest. Take the first K of those.- Majority vote:
int(np.mean(k_nearest_labels) >= 0.5)works for binary labels. More generally, usenp.bincount(k_nearest_labels).argmax().
Common pitfalls
- Forgetting to normalize features: If one feature ranges 0–1000 and another 0–1, distance is dominated by the first. Always scale features before KNN.
- Using wrong axis in argsort:
np.argsort(distances)sorts a 1D array correctly. If you accidentally keepaxis=1, you sort within rows, not across all distances. - K larger than training set: K cannot exceed the number of training points. Add a check:
K = min(K, len(X_train)).
Worked solution
| |
Extra practice
- Warm-up: Compute the Euclidean distance from point
A=(1,2)to pointsB=(4,6)andC=(1,3)by hand. Which is closer?
- Coding: Wrap the KNN logic into a function
knn_predict(X_train, y_train, test_point, K)and test it on all 8 training points (classifying each using the remaining 7 as training data — leave-one-out). - Challenge: Generate a 2D dataset with 100 points from two Gaussian clusters. Run KNN with K=1, 5, and 10 and compute test accuracy for each. Plot how accuracy changes with K.
- Variant: Try K=1, K=5, K=10 on the 8-point training set from the main exercise with test point
(2, 2). Record how the predicted label changes. Explain why K=1 is most sensitive to noise. - Debug: The code below has a bug —
argsortresult is not sliced correctly so wrong neighbors are selected. Find and fix it.
- Conceptual: Why does KNN get slower as the training set grows? What is the time complexity of one prediction given N training points and D features?
- Recall: From memory, describe the KNN prediction algorithm in 4 steps, starting from “given a new test point.”