M^3ARL: Moment-Embedded Mean-Field Multi-Agent Reinforcement Learning for Continuous Action Space

Published in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024

Mean-field theory offers a promising solution to the scalability issues encountered in multi-agent reinforcement learning (MARL) within large-scale systems. However, most existing MARL algorithms based on mean-field theory are typically constrained to discrete action space. In continuous action space, the conventional mean-field approximation of mean-field theory breaks down since the representation of mean-field action is not well-defined. In this paper, we propose \textit{Moment-Embedded Mean-Field Multi-Agent Reinforcement Learning } (M$^3$ARL) for continuous action space, embedding mean-field action into multi-order moments. Specifically, we analyze the mean-field approximation on Wasserstein space and derive the approximated form of $Q$-function. Furthermore, to derive a learn-able form of $Q$-function, we apply the cylindrical function and propose the moment-embedded representations. Based on the representations and proximal policy optimization (PPO) algorithm, we propose a novel algorithm, M$^3$APPO. Finally, we validate the efficacy of the proposed algorithm through experiments of predator-prey environment on continuous action space scenarios.

Recommended citation: Tang, Huaze, et al. "M3ARL: Moment-Embedded Mean-Field Multi-Agent Reinforcement Learning for Continuous Action Space." ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024.