Multi-agent RL · JAX / JaxMARL

Adaptive Opponent Modeling
for Adversarial Co-Training

Infer the opponent's hidden strategy with calibrated uncertainty. Plan against the belief. Validated end to end in a controlled predator–prey game.

scroll ↓1

The task

The strategy is hidden inside the opponent

2

Headline result

predator	captures / ep	vs blind
confident but wrong guess	1.42	−47%
opponent-blind	2.68	—
hard-inferred intent (one-shot)	2.56	−5%
reactive belief (uncertainty-aware)	2.82	+5%
planner, flat belief (ablation)	3.07	+15%
oracle, true strategy (reactive)	4.05	+51%
planner + inferred belief	4.31	+61%

3

Planning

4

Representation

Same data, same 2-D latent, no labels
VAE probe 0.53 — JEPA 0.89 (3 seeds), above supervised 0.85
Reads the strategy from ~half the observation steps; label-free belief planner hits 4.08

5

Honest negative

6

Second strategy axis

7

Behaviour cloning

True placement: 0.800 vs naive 0.776 — effect is real
More observation → VAE latent overtakes vanilla and hits the oracle ceiling (0.795 vs 0.792); the gain tracks the probe
BC is a faithful clone: recovers 86% of the MAPPO expert's capture-edge

8

Shape the latent — contrastive / equivariant / return-relevant MI, so the strategy clusters
Model-based baselines — belief re-infers in ~12 steps; world models must re-fit
Adaptive opponent — switching intent mid-episode; track, don't just classify

9