Use this self-check after completing Phase 0: Programming from Zero . If you can answer at least 8 of 10 correctly, you are ready to move on.
1. Predict the output# What does this print?
1
2
3
4
5
6
7
x = 5
if x > 3 :
print ( "A" )
elif x > 1 :
print ( "B" )
else :
print ( "C" )
Answer A — The first condition x > 3 is True (5 > 3), so Python executes the first branch and skips the rest.
2. Write a function# Write clamp(x, lo, hi) that returns lo if x < lo, hi if x > hi, otherwise x. (Used in PPO to clip ratios.)
Try it — edit and run (Shift+Enter)
Load Python REPL Runs in your browser — no install needed
Answer 1
2
3
4
def clamp ( x , lo , hi ):
if x < lo : return lo
if x > hi : return hi
return x
3. Find the bug# This should print the sum of a rewards list but has a bug. Find it.
1
2
3
4
5
rewards = [ 0.5 , 0.3 , 0.2 ]
total = 0
for r in rewards
total = total + r
print ( total )
Answer Missing colon after for r in rewards. Fix: for r in rewards: (Python requires : after loop/if/def headers). SyntaxError without it.
4. Predict the output# 1
2
3
4
q = [ 0.1 , 0.5 , 0.3 ]
print ( q [ 1 ])
print ( q [ - 1 ])
print ( len ( q ))
Answer 0.5 (index 1), 0.3 (last element, index -1), 3 (length).
5. Write a function# Write discounted_return(rewards, gamma) that computes G = r₀ + γr₁ + γ²r₂ + ⋯. Test with rewards=[0,0,1], gamma=0.9 (expected: 0.81).
Try it — edit and run (Shift+Enter)
Load Python REPL Runs in your browser — no install needed
Answer 1
2
def discounted_return ( rewards , gamma ):
return sum ( gamma ** t * r for t , r in enumerate ( rewards ))
6. Find the bug# 1
2
3
4
5
def epsilon_greedy ( Q , epsilon = 0.1 ):
import random
if random . random () > epsilon :
return random . randrange ( len ( Q ))
return Q . index ( max ( Q ))
Answer Condition is reversed : > epsilon means mostly exploring (wrong). Fix: if random.random() < epsilon: for exploration, else: return Q.index(max(Q)) for exploitation.
7. Predict the output# 1
2
3
4
config = { "gamma" : 0.99 , "epsilon" : 0.1 }
config [ "lr" ] = 0.001
print ( len ( config ))
print ( "alpha" in config )
Answer 3 (three keys: gamma, epsilon, lr), False (“alpha” is not a key).
8. Write a function# Write max_q(Q_dict, state, n_actions) that returns the maximum Q-value for a given state. Q_dict maps (state, action) → float. Use Q_dict.get((state, a), 0.0) for each action.
Try it — edit and run (Shift+Enter)
Load Python REPL Runs in your browser — no install needed
Answer 1
2
def max_q ( Q_dict , state , n_actions ):
return max ( Q_dict . get (( state , a ), 0.0 ) for a in range ( n_actions ))
9. Find the bug# 1
2
3
4
5
def run_episode ( rewards , gamma = 0.9 ):
G = 0
for t in range ( len ( rewards )):
G = G + gamma * rewards [ t ] # bug
return G
Answer The discount is wrong: gamma * rewards[t] should be gamma**t * rewards[t]. As written, every reward is discounted by exactly γ regardless of step. Fix: G += gamma**t * rewards[t].
10. Predict the output# 1
2
3
4
5
6
def f ( x , y = 2 ):
return x * y
print ( f ( 3 ))
print ( f ( 3 , 3 ))
print ( f ( y = 4 , x = 2 ))
Answer 6 (3×2), 9 (3×3), 8 (x=2, y=4 → 2×4). Default argument y=2 is used when not specified.
Score: 8–10: Ready for Phase 1. 6–7: Review the specific topics you missed. Below 6: Complete Phase 0 and the Python Confidence Builder before continuing.