Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs

Shulun Chen, Runlong Zhou, Zihan Zhang, Maryam Fazel, Simon S. Du

We propose a novel analysis of gap-dependent regrets by introducing a necessary term named maximum conditional total variance. The proposed model-based algorithm is tight on both the horizon and the variance dependencies.

Access abstract here