Gradient Descent

Predict a continuous value with ŷ = wx + b. Derive the MSE loss and compute one gradient descent step from scratch.

The optimization algorithm behind every trained ML model: iteratively follow the negative gradient to minimize a loss.

Binary classifier from scratch: sigmoid + cross-entropy loss + gradient update. The building block of softmax policies.