Scalable Multi-Agent Multi-View Video World Models
Video world models have achieved remarkable success in simulating environment dynamics, yet most existing approaches are limited to single-agent scenarios, failing to capture complex interactions inherent in real-world multi-agent systems. We present MultiWorld, a unified framework for multi-agent multi-view world modeling that enables accurate control of multiple agents while maintaining multi-view consistency.
We introduce the Multi-Agent Condition Module (MACM) to achieve precise multi-agent controllability, and Global State Encoder (GSE) to ensure coherent observations across different views. MultiWorld allows the number of agents and views to be flexibly scaled. Meanwhile, MultiWorld synthesizes different views in parallel, enabling high efficiency and scalability across multiple views.
Experiments on multi-player game environments and multi-robot manipulation tasks demonstrate that MultiWorld outperforms baselines in video fidelity, action following ability, and multi-view consistency.
| Method | FVD ↓ | LPIPS ↓ | SSIM ↑ | PSNR ↑ | Action ↑ | RPE ↓ |
|---|---|---|---|---|---|---|
| Standard | 245 | 0.36 | 0.50 | 17.48 | 88.4 | 0.75 |
| Concat-View | 215 | 0.36 | 0.49 | 17.54 | 89.1 | 0.74 |
| COMBO | 207 | 0.34 | 0.51 | 17.82 | 89.3 | 0.72 |
| MultiWorld | 179 | 0.35 | 0.51 | 17.72 | 89.8 | 0.67 |
| Method | FVD ↓ | LPIPS ↓ | SSIM ↑ | PSNR ↑ | Action ↑ | RPE ↓ |
|---|---|---|---|---|---|---|
| Standard | 100 | 0.07 | 0.90 | 26.39 | 88.2 | 1.60 |
| Concat-View* | 106 | 0.06 | 0.90 | 27.44 | 92.0 | 0.82 |
| COMBO | 99 | 0.08 | 0.90 | 26.49 | 88.5 | 1.54 |
| MultiWorld | 96 | 0.07 | 0.90 | 26.60 | 88.7 | 1.52 |
*Concat-View trained on two camera views only. Bold = best, blue = second best.
AI-generated multi-player gameplay videos from the It Takes Two dataset.
Multi-robot manipulation tasks from RoboFactory dataset.
Two Robots Stack Cube - Success Trajectory #1
Two Robots Stack Cube - Success Trajectory #2
Two Robots Stack Cube - Success Trajectory #3
Three Robots Stack Cube - Failure Trajectory #1
Three Robots Stack Cube - Failure Trajectory #2