Hopper

Overall Progress 0%

Compare SAC and PPO on Hopper, Walker2d; when to choose which.

Random policy dataset on Hopper; naive SAC overestimation.