CTF Arena — Phil Sanjaya

← projects

ai-for-games · simulation

Does centralised coordination beat individually-greedy agents at capture the flag?

Win rate, coordinated (decisive)

0.985

experiments/results/summary.md — 134/136, Wilson 95% CI 0.957–1.000

Matches in committed study

600

study_raw.csv row count; design_document.md study spec

Respawns with influence map

31 → 8 per match

summary.md secondary study — Mann-Whitney p = 4.05e-50

Python · pyglet · GOAP · A* · FSM · steering · 2026

Problem

The COS30002 brief asks for four families of game AI — architecture, graph search, steering, and goal planning — combined into one working system. CTF Arena makes them compete: two fully autonomous teams of three agents play capture the flag with identical AI stacks, and the only experimental variable is how a team decides. The research question, from the project PRD: to what extent does centralised team coordination improve competitive performance over independent, individually-greedy agents? The hypothesis predicted a higher win rate and faster captures, with the largest gains on chokepoint-heavy maps.

Approach

A clean ablation. Both conditions run the same FSM, planner, pathfinding, and steering; the Coordinated team adds a coordinator that assigns roles, while the Independent controller is a drop-in sibling that writes an empty role table — every agent falls back to a three-branch greedy policy. The committed study is 600 matches: 50 per condition per map across three symmetric maps of varying chokepoint density, all reproducible from one master seed through a headless harness that runs at roughly 90× real time. The codebase carries 134 pytest tests — including same-seed replay tests that pin determinism — and 53 functional requirements tracked through 18 Linear issues.

Architecture

Each tick runs a fixed pipeline: sense → coordinate → plan → path → steer → resolve. A six-state FSM executes how an agent acts; a STRIPS-style GOAP planner decides what it pursues; A* with an influence map plans where — path cost rises in dangerous cells; a blended steering layer (seek, arrive, flee, pursuit, evade, separation, obstacle avoidance, wander, and a carrier blend) produces the actual motion. The coordinator evaluates team state, picks a posture from Attack, Balanced, Defend, or Retrieve, expands it to a role template, and assigns roles to the nearest agents through a shared blackboard. The wander behaviour drifting behind this site’s home page is the same maths, ported to TypeScript.

Results

Across 150 head-to-head matches the coordinated team won 134, lost 2, and drew 14 — a 0.985 win rate over decisive matches (Wilson 95% CI 0.957–1.000, binomial p = 1.07e-37) and 0.893 over all matches. The hypothesis held, but one part was refuted: the advantage was smallest on the chokepoint map (0.760) and largest on open ground (0.960), the opposite of the prediction — reported as found. The mechanism was the second surprise: in symmetric play the coordinated team scores fewer captures (0.7 vs 1.6 per match); it wins by defence, not offence. The strongest single result belongs to the influence map, which cut agent respawns from 31.07 to 8.03 per match (p = 4.05e-50).

Reflection

The first coordinator made the team worse: it lost the pilot roughly 12–2 to the greedy baseline because its postures were too defensive, and a role-template rebalance — persistent defender, two attackers — reversed the result. Determinism nearly broke the study from the other side: with fixed spawns, different-seed matches played out almost identically, so trial variance had to be reintroduced through seeded random spawn positions. The discipline that made the project work — verify before commit, one issue at a time, numbers only with receipts — is the same workflow that built the site you are reading. Fog of war is scaffolded but unfinished; partial observability is the natural next experiment.