A2C

Overall Progress 0%

A2C for CartPole with TD error as advantage; sync multi-env.

Go to Chapter 36: Advantage Actor-Critic (A2C) →

ICM: forward model, prediction error as intrinsic reward; A2C on maze.

Go to Chapter 63: Curiosity-Driven Exploration (ICM) →