This page covers the NumPy you need for the preliminary assessment: creating arrays, indexing, slicing, and element-wise operations. Back to Preliminary.
Why this matters for RL
Environments return observations as arrays; neural networks consume batches of arrays. NumPy is the standard way to represent states, reward vectors, and batches of transitions. You need to create and reshape arrays, slice them, and know the difference between element-wise and matrix multiplication.
Learning objectives
Create and index NumPy arrays; set rows/columns; compute element-wise products and matrix-vector products; use np.dot or @ correctly.
Core concepts
- Creation:
np.zeros(shape),np.array(list). Shape is a tuple, e.g.(3, 3). - Indexing:
arr[0]is the first row;arr[i, j]is element at rowi, columnj. Slicing:arr[i:j],arr[:, j]. - Element-wise:
arr * arr,arr + 1,np.square(arr). Same shape in and out. - Matrix product:
A @ bornp.dot(A, b)for matrix-vector; shapes must match (cols of A = length of b).
Worked problems (with explanations)
1. 3×3 array and element-wise product (Q8)
Q: Create a 3×3 NumPy array of zeros, then set the first row to [1,2,3]. How do you compute the element-wise product of this array with itself?
After setting the first row, Answer and explanation
1
2
3
4
import numpy as np
arr = np.zeros((3, 3))
arr[0] = [1, 2, 3]
prod = arr * arr # or np.square(arr)
arr is
[[1, 2, 3],
[0, 0, 0],
[0, 0, 0]].
Then arr * arr squares each entry element-wise: [[1, 4, 9], [0, 0, 0], [0, 0, 0]].Explanation
np.zeros((3,3)) allocates a 3×3 array of zeros. Assigning arr[0] = [1,2,3] fills the first row. The * operator is element-wise; for matrix multiplication we would use arr @ arr (or np.dot(arr, arr)), which would be a 3×3 matrix product. In RL we often do element-wise ops on reward masks or squared errors, and matrix products for linear layers.
2. Dot product and matrix-vector product
Q: Create vectors x and y as NumPy arrays of length 3 and compute their dot product. Create a 2×3 matrix A and a 3-vector w; compute A @ w.
For 1D arrays, Answer and explanation
1
2
3
4
5
6
7
8
import numpy as np
x = np.array([1., 2., 3.])
y = np.array([4., 5., 6.])
dot = np.dot(x, y) # or x @ y -> 32.0
A = np.array([[1., 0., 1.], [0., 1., 1.]]) # 2×3
w = np.array([1., -1., 2.])
Aw = A @ w # or np.dot(A, w) -> array([3., 1.])
Explanation
np.dot(x, y) is the dot product (a scalar). For matrix-vector, A @ w requires the number of columns of A (3) to equal the length of w (3); the result has length equal to the number of rows of A (2). Each entry of Aw is the dot product of one row of A with w. In linear value approximation, rows of A could be feature vectors and Aw the predicted values.
3. Slicing and shape
Q: Create a 4×5 array of consecutive integers (0 to 19). Extract the 2×2 block in the top-left corner. Extract the last column.
Answer and explanation
1
2
3
arr = np.arange(20).reshape(4, 5)
top_left = arr[:2, :2] # rows 0–1, cols 0–1
last_col = arr[:, -1] # all rows, last column
Explanation
reshape(4, 5) gives a 4×5 array. Slicing arr[:2, :2] takes the first two rows and first two columns. arr[:, -1] takes all rows and the last column (index -1). In RL, we often slice batches: e.g. states[batch_inds] or rewards[:, -1] for the last step rewards.
Code examples (with explanations)
Batch of states and mean
| |
Explanation
axis=0 means “along the first dimension,” so we average over the 10 samples and get one vector of length 4. In RL we often normalize state batches this way; mean(axis=0) and std(axis=0) are used in running mean/std normalization for observations.
Professor’s hints
- Always check shapes:
arr.shapeafter creation and after each operation. Matrix product: (m,n) @ (n,) → (m,). - Element-wise
*requires same shape (or broadcastable). For dot product of two 1D arrays usenp.dot(a, b)ora @ b. - Use
arr.reshape(-1, n)to flatten leading dimensions and keepncolumns; NumPy infers the first dimension.
Common pitfalls
- Confusing
*and@:A * Bis element-wise;A @ Bis matrix product. Wrong choice gives wrong shape or wrong numbers. - Axis:
axis=0is “over rows” (first index);axis=1is “over columns.” Sum/mean over samples in a batch is usuallyaxis=0. - Integer array indexing:
arr[0]gives a view when possible; assigning to a slice (e.g.arr[0] = x) modifies the original array.