Learning objectives
- Explain why K-Means is unsupervised and when it is appropriate.
- Execute the assignment and update steps of K-Means by hand on a small dataset.
- Identify convergence and understand the role of initialization.
Concept and real-world motivation
All the algorithms so far required labels. Unsupervised learning finds structure in data with no labels at all. K-Means groups unlabeled points into K clusters by alternating between two steps: (1) assignment — assign each point to the nearest centroid; (2) update — move each centroid to the mean of its assigned points. Repeating until assignments stop changing guarantees convergence (though not necessarily to the global optimum).
K-Means has countless applications: customer segmentation, image compression (quantize pixel colors), document clustering, and anomaly detection. In RL, we sometimes cluster similar states together to build a smaller abstract state space — reducing the problem size before training a policy. This is called state abstraction or tile coding.
Illustration: Final cluster sizes after K-Means converges on a 3-cluster dataset.
Exercise: Run 3 iterations of K-Means by hand on 6 points in 2D.
Professor’s hints
np.linalg.norm(centroids - p, axis=1)computes the distance from pointpto each of the K centroids in one line.axis=1sums across the 2 feature dimensions.- The update step is simply the mean:
points[mask].mean(axis=0). If a cluster is empty (no points assigned), keep the old centroid or re-initialize — never divide by zero. - Convergence: when
np.array_equal(assignments, prev_assignments). After iteration 2, the assignments in this example should be stable.
Common pitfalls
- Dividing by zero in update: If a cluster gets no assigned points,
mean([]) = NaN. Always checkmask.sum() > 0. - Sensitivity to initialization: K-Means can converge to different local optima depending on initial centroids. Run multiple times with different seeds (sklearn’s
n_initparameter) and keep the best result (lowest inertia). - Choosing K: K is a hyperparameter. Use the elbow method — plot inertia vs K — and look for the “elbow” where adding more clusters gives diminishing returns.
Worked solution
| |
Extra practice
- Warm-up: Plot the 6 points from the main exercise colored by their cluster assignment after the final iteration. Use a simple ASCII-style print or matplotlib scatter.
- Coding: Wrap K-Means into a function
kmeans(points, K, n_iter=10)that returns final assignments and centroids. Test it with K=2 and K=3 on the 6-point dataset. - Challenge: Generate 100 2D points from 3 Gaussian clusters using
np.random.randn. Run K-Means with K=2, K=3, K=4. Compute inertia (sum of squared distances from each point to its centroid) for each K. Plot inertia vs K and identify the elbow. - Variant: Try 3 different initial centroid choices on the 6-point dataset: (a) the ones in the exercise, (b) three random points, (c) the first three data points. Do all initializations converge to the same clusters?
- Debug: The code below has a bug — the centroid update divides by the total number of points instead of the cluster size. Find and fix it.
- Conceptual: K-Means minimizes within-cluster sum of squared distances (inertia). Is this guaranteed to find the global minimum? Why or why not?
- Recall: From memory, write the two steps of the K-Means algorithm (assignment and update) and state what “convergence” means.