NumPy

Used in Preliminary: NumPy and throughout the curriculum for state/observation arrays, reward vectors, and batch operations. RL environments return observations as arrays; neural networks consume batches of arrays—NumPy is the standard bridge. Why NumPy matters for RL Arrays — States and observations are vectors or matrices; rewards over time are 1D arrays. np.zeros(), np.array(), np.arange() are used constantly. Indexing and slicing — Extract rows/columns, mask by condition, gather batches. Fancy indexing appears in replay buffers and minibatches. Broadcasting — Apply operations across shapes without writing loops (e.g. subtract mean from a batch). Random — np.random for \(\epsilon\)-greedy, environment stochasticity, and reproducible seeds. Math — np.sum, np.mean, dot products, element-wise ops. No need for Python loops over elements. Core concepts with examples Creating arrays 1 2 3 4 5 6 7 8 9 10 11 12 13 import numpy as np # Preallocate for states (e.g. 4D state for CartPole) state = np.zeros(4) state = np.array([0.1, -0.2, 0.05, 0.0]) # Grid of values (e.g. for value function over 2D grid) grid = np.zeros((3, 3)) grid[0] = [1, 2, 3] # Ranges and linspace steps = np.arange(0, 1000, 1) # 0, 1, ..., 999 x = np.linspace(0, 1, 11) # 11 points from 0 to 1 Shape, reshape, and batch dimension 1 2 3 arr = np.array([[1, 2], [3, 4], [5, 6]]) # shape (3, 2) batch = arr.reshape(1, 3, 2) # (1, 3, 2) for "1 sample" flat = arr.flatten() # (6,) Indexing and slicing 1 2 3 4 5 6 7 8 9 10 11 12 # Slicing: first two rows, all columns arr[:2, :] # Last row arr[-1, :] # Boolean mask: rows where first column > 2 mask = arr[:, 0] > 2 arr[mask] # Integer indexing: rows 0 and 2 arr[[0, 2], :] Broadcasting and element-wise ops 1 2 3 4 5 6 7 8 # Subtract mean from each column X = np.random.randn(32, 4) # 32 samples, 4 features X_centered = X - X.mean(axis=0) # Element-wise product (e.g. importance weights) a = np.array([1.0, 2.0, 0.5]) b = np.array([1.0, 1.0, 2.0]) a * b # array([1., 2., 1.]) Random and seeding 1 2 3 4 5 6 7 np.random.seed(42) # Unit Gaussian (for bandit rewards, noise) samples = np.random.randn(10) # Uniform [0, 1) u = np.random.rand(5) # Random integers in [low, high) action = np.random.randint(0, 4) # one of 0,1,2,3 Useful reductions 1 2 3 4 5 arr = np.array([[1, 2], [3, 4], [5, 6]]) arr.sum() # 21 arr.sum(axis=0) # [9, 12] arr.mean(axis=1) # [1.5, 3.5, 5.5] np.max(arr, axis=0) # [5, 6] Worked examples Example 1 — Discounted return (Exercise 7). Given rewards = np.array([0.0, 0.0, 1.0]) and gamma = 0.9, compute \(G_0 = r_0 + \gamma r_1 + \gamma^2 r_2\) using NumPy. ...

March 10, 2026 · 6 min · 1184 words · codefrydev

NumPy

This page covers the NumPy you need for the preliminary assessment: creating arrays, indexing, slicing, and element-wise operations. Back to Preliminary. Why this matters for RL Environments return observations as arrays; neural networks consume batches of arrays. NumPy is the standard way to represent states, reward vectors, and batches of transitions. You need to create and reshape arrays, slice them, and know the difference between element-wise and matrix multiplication. Learning objectives Create and index NumPy arrays; set rows/columns; compute element-wise products and matrix-vector products; use np.dot or @ correctly. ...

March 10, 2026 · 4 min · 793 words · codefrydev