Chapter 73: Decision Transformers

Learning objectives Implement a Decision Transformer: a transformer (or GPT-style) model that takes sequences of (returns-to-go, state, action) and predicts actions conditioned on desired return (and past states/actions). Explain the formulation: at each timestep, input (R_t, s_t, a_{t-1}, R_{t-1}, s_{t-1}, …) where R_t is the return from t onward; the model predicts a_t. Training is supervised on offline trajectories. Train the model on a simple environment’s offline dataset and test by conditioning on different returns-to-go (e.g. high return for “expert” behavior). Compare with offline RL (e.g. CQL) in terms of implementation and how the policy is extracted (conditioning vs maximization). Relate Decision Transformers to recommendation (sequence of user-item-reward) and dialogue (conditioning on desired outcome). Concept and real-world RL ...

March 10, 2026 · 4 min · 716 words · codefrydev