Sentiment

Overall Progress 0%

PPO fine-tune small LM (e.g. GPT-2) for sentiment; KL penalty.