Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
Runlong Zhou, Simon S. Du, Beibin Li
ACL 2024
We developed Reflect-RL, a two-player system to align language models with interactive decision-making tasks. Techniques included are reflection, negative example generation, single-prompt action enumeration, and curriculum learning.
Access abstract here