TY - GEN
T1 - Data-Driven Robust Multi-Agent Reinforcement Learning
AU - Wang, Yudan
AU - Wang, Yue
AU - Zhou, Yi
AU - Velasquez, Alvaro
AU - Zou, Shaofeng
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Multi-agent reinforcement learning (MARL) in the collaborative setting aims to find a joint policy that maximizes the accumulated reward averaged over all the agents. In this paper, we focus on MARL under model uncertainty, where the transition kernel is assumed to be in an uncertainty set, and the goal is to optimize the worst-case performance over the uncertainty set. We investigate the model-free setting, where the uncertain set centers around an unknown Markov decision process from which a single sample trajectory can be obtained sequentially. We develop a robust multi-agent Q-learning algorithm, which is model-free and fully decentralized. We theoretically prove that the proposed algorithm converges to the minimax robust policy, and further characterize its sample complexity. Our algorithm, comparing to the vanilla multi-agent Q-learning, offers provable robustness under model uncertainty without incurring additional computational and memory cost.
AB - Multi-agent reinforcement learning (MARL) in the collaborative setting aims to find a joint policy that maximizes the accumulated reward averaged over all the agents. In this paper, we focus on MARL under model uncertainty, where the transition kernel is assumed to be in an uncertainty set, and the goal is to optimize the worst-case performance over the uncertainty set. We investigate the model-free setting, where the uncertain set centers around an unknown Markov decision process from which a single sample trajectory can be obtained sequentially. We develop a robust multi-agent Q-learning algorithm, which is model-free and fully decentralized. We theoretically prove that the proposed algorithm converges to the minimax robust policy, and further characterize its sample complexity. Our algorithm, comparing to the vanilla multi-agent Q-learning, offers provable robustness under model uncertainty without incurring additional computational and memory cost.
KW - Distributionally robust
KW - finite-time analysis
KW - model-free
KW - robust MDP
KW - sample complexity
UR - https://www.scopus.com/pages/publications/85142763167
U2 - 10.1109/MLSP55214.2022.9943500
DO - 10.1109/MLSP55214.2022.9943500
M3 - Conference contribution
AN - SCOPUS:85142763167
T3 - IEEE International Workshop on Machine Learning for Signal Processing, MLSP
BT - 2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing, MLSP 2022
PB - IEEE Computer Society
T2 - 32nd IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2022
Y2 - 22 August 2022 through 25 August 2022
ER -