Publications

*: indicating equal contribution or alphabetic ordering.

Reflect-RL: Two-Player Online RL Fine-Tuning for LMs

Runlong Zhou, Simon S. Du, Beibin Li

ACL 2024


Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning

Zhaoyi Zhou, Chuning Zhu, Runlong Zhou, Qiwen Cui, Abhishek Gupta, Simon S. Du

ICLR 2024


Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization

Runlong Zhou, Zelin He, Yuandong Tian, Yi Wu, Simon S. Du

TMLR


Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Jean Tarbouriech*, Runlong Zhou*, Simon S. Du, Matteo Pirotta, Michal Valko, Alessandro Lazaric

NeurIPS 2021 (Spotlight, 3% acceptance rate)


Preprints

*: indicating equal contribution or alphabetic ordering.

Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques

Natalia Zhang*, Xinqi Wang*, Qiwen Cui*, Runlong Zhou, Sham M. Kakade, Simon S. Du