Offline RL
Overall Progress
0%
Random policy dataset on Hopper; naive SAC overestimation.
CQL loss penalizing Q for OOD actions; compare with naive SAC.
Decision Transformer: returns-to-go, states, actions; GPT-like predict actions.
Review Volume 7 (Exploration, ICM, RND, Go-Explore, Meta-RL) and preview Volume 8 (Offline RL, Imitation Learning, RLHF).
Review Volume 8 (Offline RL, Imitation Learning, IRL, RLHF) and preview Volume 9 (Multi-Agent RL — cooperation, competition, game theory).