Learning objectives

  • Execute a complete ML workflow from raw data to model comparison.
  • Apply StandardScaler, train multiple classifiers, and evaluate with accuracy, precision, and recall.
  • Interpret results and make a justified model choice.

Concept and real-world motivation

This page is a mini-project that integrates every concept from the ML Foundations section. There is no new theory — only application. Real ML work looks exactly like this: load data, explore it, preprocess, train several models, evaluate honestly on held-out data, and compare results systematically.

The same workflow applies to RL evaluation: load or generate trajectories, preprocess states, train a value function or policy, evaluate on unseen episodes, and compare agent variants. The “best model” in supervised learning is the one with the best test metrics; the “best agent” in RL is the one that maximizes expected return across new environments. This project is your bridge between the two worlds.

Illustration: Compare accuracy across three classifiers.

Exercise — Full pipeline on the Wine dataset (Steps 1–4):

Load and explore the Wine dataset, preprocess, and train three models.

Try it — edit and run (Shift+Enter)

Professor’s hints

  • scaler.fit_transform(X_train) fits AND transforms in one step. Then scaler.transform(X_test) applies the SAME scaling (do not refit on test — that would be data leakage).
  • precision_score(..., average='macro') averages precision across all 3 classes equally. Use 'weighted' if classes are imbalanced.
  • stratify=y in train_test_split ensures all 3 wine classes appear in both train and test in the right proportions.

Common pitfalls

  • Data leakage via scaler: scaler.fit_transform(X) on all data before splitting leaks test statistics into training. Always fit the scaler only on X_train.
  • Forgetting stratify on multi-class data: Without it, small classes may vanish from the test set, making evaluation meaningless.
  • Comparing models trained with different preprocessing: All three models above use the same scaled data — that is fair. Comparing scaled LR to unscaled DT would not be.
Worked solution — preprocessing and training
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score

wine = load_wine()
X, y = wine.data, wine.target
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y)

scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s  = scaler.transform(X_test)

for name, model in [('LR', LogisticRegression(max_iter=1000)),
                    ('DT', DecisionTreeClassifier(random_state=42)),
                    ('KNN', KNeighborsClassifier(n_neighbors=5))]:
    model.fit(X_train_s, y_train)
    y_pred = model.predict(X_test_s)
    print(f'{name}: acc={accuracy_score(y_test, y_pred):.3f}')

Extra practice

  1. Step 1–2 — Exploration: Load the Wine dataset and display a bar chart of class distribution and the mean value of each feature per class.
Try it — edit and run (Shift+Enter)
  1. Coding: Add cross_val_score (5-fold) for each of the three models. Report mean ± std. Do the CV scores agree with the single test-set scores?

  2. Challenge: Add a fourth model: RandomForestClassifier(n_estimators=100, random_state=42). Compare all four models with a bar chart. Does the ensemble beat the individual models?

  3. Variant: Re-run the pipeline without StandardScaler. How much does accuracy change for LogisticRegression? For DecisionTreeClassifier? Explain why trees are scale-invariant.

  4. Debug: The code below has a bug — StandardScaler is fit on the full dataset before the split, causing data leakage. Find and fix it.

Try it — edit and run (Shift+Enter)
  1. Conceptual: Which model worked best on the Wine dataset in your run? Give one reason why logistic regression might outperform a decision tree on this dataset.

  2. Recall: In 3 sentences, describe the full ML workflow you executed in this mini-project, from raw data to final model comparison.