CartPole
Learning objectives Understand the CartPole environment: state (cart position, velocity, pole angle, pole angular velocity), actions (left/right), and reward (+1 per step until termination). Implement a solution using linear function approximation (e.g. tile coding or simple features) and semi-gradient SARSA or Q-learning. Optionally solve with a small neural network (e.g. DQN-style) as in later chapters. What is CartPole? CartPole (also called Inverted Pendulum) is a classic control task in OpenAI Gym / Gymnasium. A pole is attached to a cart that moves on a track. The state is continuous: cart position \(x\), cart velocity \(\dot{x}\), pole angle \(\theta\), pole angular velocity \(\dot{\theta}\). Actions are discrete: 0 = push left, 1 = push right. Reward: +1 for every step until the episode ends. The episode ends when the pole angle goes outside a range (e.g. \(\pm 12°\)) or the cart leaves the track (if bounded), or after a max step count (e.g. 500). So the goal is to keep the pole upright as long as possible (maximize total reward = number of steps). ...