TensorFlow
Alternative to PyTorch for implementing DQN, policy gradients, and other deep RL algorithms. The Keras API provides layers and optimizers; GradientTape gives full control over custom loss functions (e.g. policy gradient, CQL). Why TensorFlow matters for RL Keras API — tf.keras.Sequential, tf.keras.Model, layers (Dense, Conv2D). Quick prototyping of Q-networks and policies. Gradient tape — tf.GradientTape() records operations so you can compute gradients of any scalar with respect to trainable variables. Essential for policy gradient and custom losses. Optimizers — tf.keras.optimizers.Adam, apply_gradients. Device placement — GPU via tf.config when available. Core concepts with examples Dense layers and Sequential model 1 2 3 4 5 6 7 8 import tensorflow as tf model = tf.keras.Sequential([ tf.keras.layers.Dense(64, activation="relu", input_shape=(4,)), tf.keras.layers.Dense(64, activation="relu"), tf.keras.layers.Dense(2), # Q-values for 2 actions ]) model.build(input_shape=(None, 4)) Forward pass and MSE loss 1 2 3 4 states = tf.random.normal((32, 4)) q_values = model(states) targets = tf.random.normal((32, 2)) loss = tf.reduce_mean((q_values - targets) ** 2) Training step with GradientTape 1 2 3 4 5 6 7 8 9 10 11 12 optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3) @tf.function def train_step(states, targets): with tf.GradientTape() as tape: q_values = model(states) loss = tf.reduce_mean((q_values - targets) ** 2) grads = tape.gradient(loss, model.trainable_variables) optimizer.apply_gradients(zip(grads, model.trainable_variables)) return loss loss_val = train_step(states, targets) Subclassing for custom models 1 2 3 4 5 6 7 8 9 10 11 class QNetwork(tf.keras.Model): def __init__(self, n_actions=2): super().__init__() self.d1 = tf.keras.layers.Dense(64, activation="relu") self.d2 = tf.keras.layers.Dense(64, activation="relu") self.out = tf.keras.layers.Dense(n_actions) def call(self, x): x = self.d1(x) x = self.d2(x) return self.out(x) Exercises Exercise 1. Create a Sequential model with one hidden layer (64 units, ReLU) and output dimension 2. Build it with input_shape=(4,). Call model(tf.random.normal((10, 4))) and print the output shape. Then use model.summary() to inspect parameters. ...