Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques
Natalia Zhang*, Xinqi Wang*, Qiwen Cui*, Runlong Zhou, Sham M. Kakade, Simon S. Du
We study Multi-Agent Reinforcement Learning from Human Feedback (MARLHF) by exploring both theoretical foundations and empirical validations. Included in our proposed methods are Mean Squared Error (MSE) regularization and imitation learning.