Overestimation

Overall Progress 0%

Double DQN: online selects, target evaluates; compare with DQN.

Random policy dataset on Hopper; naive SAC overestimation.