Reflect-RL: Two-Player Online RL Fine-Tuning for LMs

Runlong Zhou, Simon S. Du, Beibin Li

ACL 2024 (main conference)

We developed Reflect-RL, a two-player system to align language models with interactive decision-making tasks. Techniques included are reflection, negative example generation, single-prompt action enumeration, and curriculum learning.

Download here