Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques

Natalia Zhang*, Xinqi Wang*, Qiwen Cui*, Runlong Zhou, Sham M. Kakade, Simon S. Du

We study Multi-Agent Reinforcement Learning from Human Feedback (MARLHF) by exploring both theoretical foundations and empirical validations. Included in our proposed methods are Mean Squared Error (MSE) regularization and imitation learning.

Abstract