Publications

*: indicating equal contribution or alphabetic ordering.

Reflect-RL: Two-Player Online RL Fine-Tuning for LMs

Runlong Zhou, Simon S. Du, Beibin Li

ACL 2024 (main conference)


Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning

Zhaoyi Zhou, Chuning Zhu, Runlong Zhou, Qiwen Cui, Abhishek Gupta, Simon S. Du

ICLR 2024


Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization

Runlong Zhou, Zelin He, Yuandong Tian, Yi Wu, Simon S. Du

TMLR


Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Jean Tarbouriech*, Runlong Zhou*, Simon S. Du, Matteo Pirotta, Michal Valko, Alessandro Lazaric

NeurIPS 2021 (Spotlight, 3% acceptance rate)