Publications
*: indicating equal contribution or alphabetic ordering.
Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
Runlong Zhou, Simon S. Du, Beibin Li
ACL 2024 (main conference)
Zhaoyi Zhou, Chuning Zhu, Runlong Zhou, Qiwen Cui, Abhishek Gupta, Simon S. Du
ICLR 2024
Runlong Zhou, Zihan Zhang, Simon S. Du
ICML 2023
Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes
Runlong Zhou, Ruosong Wang, Simon S. Du
ICML 2023
Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization
Runlong Zhou, Zelin He, Yuandong Tian, Yi Wu, Simon S. Du
TMLR
Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret
Jean Tarbouriech*, Runlong Zhou*, Simon S. Du, Matteo Pirotta, Michal Valko, Alessandro Lazaric
NeurIPS 2021 (Spotlight, 3% acceptance rate)