Imitation

Overall Progress 0%

Discriminator expert vs agent; use as reward for policy gradient.

AMP paper: task reward + adversarial style reward; combined reward.