Choosing Rewards

Learning objectives Understand how reward choice affects optimal behavior (what the agent will try to maximize). Use step penalties and terminal rewards in gridworld to encourage short paths or goal reaching. Avoid common pitfalls: reward hacking and unintended incentives. Why rewards matter The agent’s goal in an MDP is to maximize cumulative (often discounted) reward. So the reward function defines the task. Changing rewards changes what is “optimal.” Design rewards so that the behavior you want is exactly what maximizes total reward. ...

March 10, 2026 · 2 min · 354 words · codefrydev