Phase 4 answers: “How do we train models from data with gradient descent?”—the same machinery you will use inside RL training loops.