Small (4–10 cities)
Brute-force exact search. The solver produces the optimal route. Direct model solving varies from exact to structurally invalid by model.
From Arizona to the world: how the fixture ladder ends, and where local models collapse.
The point is not that the model suddenly becomes the solver at larger scales. The point is that the deterministic solver path changes honestly as the problem size changes — and the orchestrated path remains authoritative regardless.
| Instance | Cities | Solver | Local direct valid? | Local outcome | Orchestrated correct? |
|---|---|---|---|---|---|
tsp-004/005 |
8 – 10 | Brute-force exact | model-dependent | Exact to structurally invalid | yes |
tsp-006 |
20 | Held-Karp exact DP | yes | Valid but worse than exact solver | yes |
tsp-007 |
50 | NN + 2-opt heuristic | yes | Valid but worse than heuristic | yes |
tsp-008 |
100 | NN + 2-opt heuristic | no | Structural collapse — all tested local models | yes |
Local models can remain usable into the geographic middle rung. They do not remain structurally reliable at world scale. The orchestrated path continues to track the deterministic solver at every rung.
Fixture: tsp-006 · sw_20 · 20 cities · real lat/lon · haversine miles · Held-Karp exact DP
| Path | Solver | Solver exact? | Valid? | Direct gap | Orchestrated target | Solver time |
|---|---|---|---|---|---|---|
| Fixture baseline | Held-Karp | yes | yes | 703.5 mi | optimal | 9.704s |
First real-geographic medium-scale rung. The deterministic path is still exact here. The direct path stays structurally valid but is materially worse than the exact solver. The orchestrated path matches the exact deterministic answer.
Fixture: tsp-007 · us_50 · 50 cities · real lat/lon · haversine miles · nearest-neighbor + 2-opt heuristic
| Path | Solver | Solver exact? | Valid? | Direct gap | Orchestrated target | Solver time |
|---|---|---|---|---|---|---|
| Fixture baseline | NN + 2-opt | heuristic | yes | 1,491.9 mi | solver | 0.013s |
National-scale rung. The deterministic path is still the professional answer, but it is no longer exact. The orchestrated path matches the deterministic solver output. The direct path remains a benchmarked baseline, not the solving authority.
Fixture: tsp-008 · world_100 · 100 cities · real lat/lon · haversine miles · nearest-neighbor + 2-opt heuristic
| Model | Direct valid? | Direct outcome | Orchestrated target | Direct time | Orchestrated time |
|---|---|---|---|---|---|
gemma3:27b |
no | missing 9 cities | solver | 21.886s | 21.285s |
qwen2.5:14b |
no | severe missing-city collapse | solver | 9.255s | 9.924s |
llama3.1:8b |
no | severe missing-city collapse | solver | 7.821s | 4.540s |
llama3.2:3b |
no | severe missing-city collapse | solver | 5.276s | 2.978s |
The local-model set does not merely drift at world scale — it collapses structurally across all tested rows. The orchestrated path still matched the deterministic solver on every row.
Two bounded frontier-anchor assessments show a different failure mode at the world rung. These are not clean shared-table benchmark rows — conditions are not identical to the local Ollama runs.
| Instance | Cities | Codex Frontier |
Claude Frontier |
Orch correct? |
|---|---|---|---|---|
tsp-004 |
8 | gap 0.0 | gap 0.0 | yes |
tsp-005 |
10 | gap 0.0 | gap 0.0 | yes |
tsp-006 |
20 | gap 0.0 | gap 0.0 (contaminated) | yes |
tsp-007 |
50 | not attempted | gap 1,484.2 mi (12.8%) | yes |
tsp-008 |
100 | 8,761.4 mi vs heuristic | 4,996.3 mi (7.5%) | yes |
Frontier models stay structurally valid far beyond the local set. Stronger frontier capability delays failure into quality drift rather than structural collapse. Even there, the solver-backed orchestrated path remains the authoritative answer.
The Phoenix point survives the size increase, but the solver contract becomes more explicit:
Brute-force exact search. The solver produces the optimal route. Direct model solving varies from exact to structurally invalid by model.
Held-Karp dynamic programming — still exact. Direct models stay valid but are worse than the exact answer.
Deterministic heuristic. The solver is no longer globally optimal, but remains the professional optimization authority.
Deterministic heuristic. Local models collapse structurally. Frontier models drift in quality. The orchestrated path holds.
The important systems lesson stays the same: correctness or optimization policy belongs in the solver path. The model should classify, route, and explain.
At world scale, the failure split becomes especially clear:
structural collapse — Drop cities, produce invalid tours. The route itself is unusable.
quality drift — Remain structurally valid but diverge materially from the deterministic heuristic baseline.
The architectural conclusion survives both regimes. Stronger models delay failure — they do not eliminate the need for solver-backed design.