CartPole

The CartPole (Inverted Pendulum) environment: state, actions, and solving it with value-based or policy-based methods.

DQN for CartPole with replay and target network.

REINFORCE for CartPole with softmax policy; note variance.

A2C for CartPole with TD error as advantage; sync multi-env.

Train NN to predict next state from CartPole; compounding error.