Trust Region

Overall Progress 0%

TRPO constrained optimization and natural gradient; KL constraint.