A2C
Overall Progress
0%
A2C for CartPole with TD error as advantage; sync multi-env.
ICM: forward model, prediction error as intrinsic reward; A2C on maze.
A2C for CartPole with TD error as advantage; sync multi-env.
ICM: forward model, prediction error as intrinsic reward; A2C on maze.