Imitation
Overall Progress
0%
Discriminator expert vs agent; use as reward for policy gradient.
AMP paper: task reward + adversarial style reward; combined reward.
Discriminator expert vs agent; use as reward for policy gradient.
AMP paper: task reward + adversarial style reward; combined reward.