Visualization & Plotting for RL

This page ties together when and what to plot in reinforcement learning, how to read common charts, and which tool to use: Matplotlib for Python scripts and notebooks, or Chart.js for interactive web demos and dashboards. Why visualization matters in RL RL training is noisy: a single run can look good or bad by chance. Plots let you see trends (is return going up?), variance (how stable is learning?), and comparisons (which algorithm or hyperparameter is better?). Every curriculum chapter that asks you to “plot the learning curve” is training you to diagnose and communicate results. ...

March 10, 2026 · 5 min · 889 words · codefrydev

Matplotlib

Used in many chapter exercises to plot average reward over time, value functions, policy comparisons, and hyperparameter heatmaps. A clear plot often reveals convergence or instability at a glance. Why Matplotlib matters for RL Line plots — Reward vs episode, loss vs step, value vs state. The default plt.plot(x, y). Multiple curves — Overlay several runs or algorithms; use label and legend(). Subplots — Several panels in one figure (e.g. reward, length, loss). Heatmaps — Value function over 2D state space; grid search over \(\alpha\) and \(\epsilon\). Saving — plt.savefig("curve.png", dpi=150) for reports and slides. Core concepts with examples Single line plot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 import matplotlib.pyplot as plt import numpy as np episodes = np.arange(100) rewards = 0.1 * episodes + 0.5 + np.random.randn(100) * 0.5 plt.figure(figsize=(8, 4)) plt.plot(episodes, rewards, alpha=0.7, label="raw") plt.xlabel("Episode") plt.ylabel("Cumulative reward") plt.title("Learning curve") plt.legend() plt.grid(True, alpha=0.3) plt.tight_layout() plt.show() Smoothed curve (moving average) 1 2 3 4 5 6 7 window = 10 smooth = np.convolve(rewards, np.ones(window)/window, mode="valid") x_smooth = np.arange(len(smooth)) plt.plot(episodes, rewards, alpha=0.3, label="raw") plt.plot(x_smooth, smooth, label=f"MA-{window}") plt.legend() plt.show() Subplots: two panels 1 2 3 4 5 6 7 8 9 fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, figsize=(8, 6)) ax1.plot(episodes, rewards) ax1.set_ylabel("Reward") ax1.set_title("Reward per episode") ax2.plot(episodes, np.cumsum(rewards)) ax2.set_ylabel("Cumulative reward") ax2.set_xlabel("Episode") plt.tight_layout() plt.show() Heatmap (e.g. value function or grid search) 1 2 3 4 5 6 7 8 # 4x4 value grid V = np.random.randn(4, 4) plt.imshow(V, cmap="viridis") plt.colorbar(label="V(s)") plt.xlabel("col") plt.ylabel("row") plt.title("State value function") plt.show() Saving 1 2 plt.savefig("learning_curve.png", dpi=150, bbox_inches="tight") plt.close() Exercises Exercise 1. Plot a line of \(y = x^2\) for \(x\) in \([0, 5]\) with 50 points. Add labels “x” and “y”, a title “y = x²”, and a grid. Save the figure as parabola.png. ...

March 10, 2026 · 4 min · 803 words · codefrydev