Overestimation
Overall Progress
0%
Double DQN: online selects, target evaluates; compare with DQN.
Random policy dataset on Hopper; naive SAC overestimation.
Double DQN: online selects, target evaluates; compare with DQN.
Random policy dataset on Hopper; naive SAC overestimation.