Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding

Abstract

Multi-Agent Reinforcement Learning (MARL) based Multi-Agent Path Finding (MAPF) has recently gained attention due to its efficiency and scalability. Several MARL-MAPF methods choose to use communication to enrich the information one agent can perceive. However, existing works still struggle in structured environments with high obstacle density and a high number of agents. To further improve the performance of the communication-based MARL-MAPF solvers, we propose a new method, Ensembling Prioritized Hybrid Policies (EPH). We first propose a selective communication block to gather richer information for better agent coordination within multi-agent environments and train the model with a Q-learning-based algorithm. We further introduce three advanced inference strategies aimed at bolstering performance during the execution phase. First, we hybridize the neural policy with single-agent expert guidance for navigating conflict-free zones. Secondly, we propose Q value-based methods for prioritized resolution of conflicts as well as deadlock situations. Finally, we introduce a robust ensemble method that can efficiently collect the best out of multiple possible solutions. We empirically evaluate EPH in complex multi-agent environments and demonstrate competitive performance against state-of-the-art neural methods for MAPF.

Contributions

We propose EPH (Ensembling Prioritized Hybrid Policies), a Q-learning-based MARL-MAPF solver with communication. Our key contributions include:

Selective Communication: Enhanced communication block using Transformer-inspired improvements for better information extraction.

Priority Conflict Resolution: Q value-based decisions to prioritize agents in conflicts and resolve deadlocks with an Advanced Escape Policy.

Hybrid Expert Guidance: Guidance for agents without nearby live agents during inference.

Ensembling Solvers: Sampling best solutions from multiple solvers running in parallel.

EPH

EPH. EPH can be divided into two parts, as shown in the above picture. The upper part shows the neural network structure of EPH, and how local partial observations are transformed into Q vectors. The lower part shows that instead of getting action by directly applying \( a^t_i = \text{argmax} (q^t_i) \), several inference techniques, as mentioned in the Contributions, can be used to improve actions quality and avoid collisions.

Training Method. The Q value for agent \(i\) is obtained via:

\[ Q_{s,a}^{i} = Val_s\left(e_{i}^{t}\right) + Adv\left(e_{i}^{t}\right)_a \]

\[ - \frac{1}{\left|\mathcal{A}\right|} \sum_{a'} Adv\left(e_{i}^{t}\right)_{a'} \]

Train by minimizing:

\[ \mathcal{L}(\theta) = \text{MSE} \left( R_t^i - Q_{s_t,a_t}^i (\theta) \right) \]

Inference Techniques

Hybrid Expert Guidance

When agent communication is sparse (e.g., no other agents in FOV), we incorporate low-cost single-agent expert \( \tau \). We propose using the optimal \( A^* \) path as expert guidance for an agent with no other live agents within a radius \( \rho \).

Prioritized Conflict Resolution

For conflict actions derived from Q vectors, we prioritize agents by their Q values. The agent with the highest Q value \( q^t_i \) gets top priority, while others re-choose actions with previous actions masked.

BibTeX

@inproceedings{tang2024eph,
        title={Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding},
        author={Tang, Huijie and Berto, Federico and Park, Jinkyoo},
        booktitle={2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
        organization={IEEE},
        year={2024},
        note={\url{https://github.com/ai4co/eph-mapf}}
}

Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding

Abstract

Contributions

EPH

Inference Techniques

Experimental Results

Videos

BibTeX