Phase 1 Self-Check: Math for RL
Use this self-check after completing Probability, Linear algebra, and Calculus. If you can answer at least 8 correctly and feel comfortable with the concepts, you are ready for Phase 2 and the curriculum. 1. Probability Q: In a bandit, you pull arm 2 five times and get rewards [0.5, 1.2, 0.8, 1.0, 0.9]. What is the sample mean? What is the unbiased sample variance (use \(n-1\) in the denominator)? Answer Step 1 — Sample mean: Sum = 0.5 + 1.2 + 0.8 + 1.0 + 0.9 = 4.4; mean = 4.4/5 = 0.88. ...