Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs
Shulun Chen, Runlong Zhou, Zihan Zhang, Maryam Fazel, Simon S. Du
We propose a novel analysis of gap-dependent regrets by introducing a necessary term named maximum conditional total variance. The proposed model-based algorithm is tight on both the horizon and the variance dependencies.
Access abstract here