Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization

RL Theory

Runlong Zhou, Zelin He, Yuandong Tian, Yi Wu, Simon S. Du

TMLR

We formulate of canonical online Combinatorial Optimization problems as Latent MDPs and give convergence guarantee of Natural Policy Gradient on LMDPs. We show effectiveness of Curriculum Learning through the perspective of relative conditional number.

Abstract Slides Poster