Used in many chapter exercises to plot average reward over time, value functions, policy comparisons, and hyperparameter heatmaps. A clear plot often reveals convergence or instability at a glance.
Why Matplotlib matters for RL Line plots — Reward vs episode, loss vs step, value vs state. The default plt.plot(x, y). Multiple curves — Overlay several runs or algorithms; use label and legend(). Subplots — Several panels in one figure (e.g. reward, length, loss). Heatmaps — Value function over 2D state space; grid search over \(\alpha\) and \(\epsilon\). Saving — plt.savefig("curve.png", dpi=150) for reports and slides. Core concepts with examples Single line plot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 import matplotlib.pyplot as plt import numpy as np episodes = np.arange(100) rewards = 0.1 * episodes + 0.5 + np.random.randn(100) * 0.5 plt.figure(figsize=(8, 4)) plt.plot(episodes, rewards, alpha=0.7, label="raw") plt.xlabel("Episode") plt.ylabel("Cumulative reward") plt.title("Learning curve") plt.legend() plt.grid(True, alpha=0.3) plt.tight_layout() plt.show() Smoothed curve (moving average) 1 2 3 4 5 6 7 window = 10 smooth = np.convolve(rewards, np.ones(window)/window, mode="valid") x_smooth = np.arange(len(smooth)) plt.plot(episodes, rewards, alpha=0.3, label="raw") plt.plot(x_smooth, smooth, label=f"MA-{window}") plt.legend() plt.show() Subplots: two panels 1 2 3 4 5 6 7 8 9 fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, figsize=(8, 6)) ax1.plot(episodes, rewards) ax1.set_ylabel("Reward") ax1.set_title("Reward per episode") ax2.plot(episodes, np.cumsum(rewards)) ax2.set_ylabel("Cumulative reward") ax2.set_xlabel("Episode") plt.tight_layout() plt.show() Heatmap (e.g. value function or grid search) 1 2 3 4 5 6 7 8 # 4x4 value grid V = np.random.randn(4, 4) plt.imshow(V, cmap="viridis") plt.colorbar(label="V(s)") plt.xlabel("col") plt.ylabel("row") plt.title("State value function") plt.show() Saving 1 2 plt.savefig("learning_curve.png", dpi=150, bbox_inches="tight") plt.close() Exercises Exercise 1. Plot a line of \(y = x^2\) for \(x\) in \([0, 5]\) with 50 points. Add labels “x” and “y”, a title “y = x²”, and a grid. Save the figure as parabola.png.
...