A2C

Overall Progress 0%

A2C for CartPole with TD error as advantage; sync multi-env.

ICM: forward model, prediction error as intrinsic reward; A2C on maze.