Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization
Runlong Zhou, Zelin He, Yuandong Tian, Yi Wu, Simon S. Du
TMLR
We formulate of canonical online Combinatorial Optimization problems as Latent MDPs and give convergence guarantee of Natural Policy Gradient on LMDPs. We show effectiveness of Curriculum Learning through the perspective of relative conditional number.
Access abstract here