CartPole
Overall Progress
0%
The CartPole (Inverted Pendulum) environment: state, actions, and solving it with value-based or policy-based methods.
DQN for CartPole with replay and target network.
REINFORCE for CartPole with softmax policy; note variance.
A2C for CartPole with TD error as advantage; sync multi-env.
Train NN to predict next state from CartPole; compounding error.