After Phase 5 you can implement Q-networks and policy networks in PyTorch; Phase 6 adds RL semantics (MDPs, Bellman, tabular methods).