EPH
EPH. EPH can be divided into two parts, as shown in the above picture. The upper part shows the neural network structure of EPH, and how local partial observations are transformed into Q vectors. The lower part shows that instead of getting action by directly applying \( a^t_i = \text{argmax} (q^t_i) \), several inference techniques, as mentioned in the Contributions, can be used to improve actions quality and avoid collisions.
Training Method. The Q value for agent \(i\) is obtained via:
\[
Q_{s,a}^{i} = Val_s\left(e_{i}^{t}\right) + Adv\left(e_{i}^{t}\right)_a
\]
\[
- \frac{1}{\left|\mathcal{A}\right|} \sum_{a'} Adv\left(e_{i}^{t}\right)_{a'}
\]
Train by minimizing:
\[
\mathcal{L}(\theta) = \text{MSE} \left( R_t^i - Q_{s_t,a_t}^i (\theta) \right)
\]