Multi-agent RL · JAX / JaxMARL

Adaptive Opponent Modeling
for Adversarial Co-Training

Infer the opponent's hidden strategy with calibrated uncertainty. Plan against the belief. Validated end to end in a controlled predator–prey game.

Paper (PDF) Code Full write-up
scroll ↓1
The task

The strategy is hidden inside the opponent

blind vs belief predators
2
Headline result

Each row isolates one effect

predatorcaptures / epvs blind
confident but wrong guess1.42−47%
opponent-blind2.68
hard-inferred intent (one-shot)2.56−5%
reactive belief (uncertainty-aware)2.82+5%
planner, flat belief (ablation)3.07+15%
oracle, true strategy (reactive)4.05+51%
planner + inferred belief4.31+61%
3
Planning

Planning on the belief beats the oracle

belief planner result
4
Representation

Predict the future, don't reconstruct the past

JEPA vs VAE latents
5
Honest negative

The same idea fails as a world model

JEPA world model negative
6
Second strategy axis

Circle vs corners: signal yes, clusters no

circle vs corners latents
7
Behaviour cloning

π(a|s, z) beats π(a|s) once z carries the strategy

latent-conditioned BC overtakes vanilla and reaches the oracle ceiling
8
Next

Three concrete steps

Paper Code Deep dive
9