RL

Overall Progress 0%

15 problems combining Python, probability, and toy RL. Complete before starting Volume 1.

NumPy for RL: arrays, indexing, broadcasting, random, and batch operations.

PyTorch for RL: tensors, autograd, nn.Module, optimizers, and GPU.

TensorFlow and Keras for RL: models, GradientTape, optimizers, and GPU.

Toy recommender, 100 items, changing user; maximize engagement.

Broken SAC: unit tests, logging Q/reward/entropy; diagnose.