Geographic Scale Ladder

The point is not that the model suddenly becomes the solver at larger scales. The point is that the deterministic solver path changes honestly as the problem size changes — and the orchestrated path remains authoritative regardless.

Compact Ladder View — Local Models

Instance	Cities	Solver	Local direct valid?	Local outcome	Orchestrated correct?
`tsp-004/005`	8 – 10	Brute-force exact	model-dependent	Exact to structurally invalid	yes
`tsp-006`	20	Held-Karp exact DP	yes	Valid but worse than exact solver	yes
`tsp-007`	50	NN + 2-opt heuristic	yes	Valid but worse than heuristic	yes
`tsp-008`	100	NN + 2-opt heuristic	no	Structural collapse — all tested local models	yes

Local models can remain usable into the geographic middle rung. They do not remain structurally reliable at world scale. The orchestrated path continues to track the deterministic solver at every rung.

20-City Southwest Geographic Comparison

Fixture: tsp-006 · sw_20 · 20 cities · real lat/lon · haversine miles · Held-Karp exact DP

Path	Solver	Solver exact?	Valid?	Direct gap	Orchestrated target	Solver time
Fixture baseline	Held-Karp	yes	yes	703.5 mi	optimal	9.704s

First real-geographic medium-scale rung. The deterministic path is still exact here. The direct path stays structurally valid but is materially worse than the exact solver. The orchestrated path matches the exact deterministic answer.

50-City U.S. Geographic Comparison

Fixture: tsp-007 · us_50 · 50 cities · real lat/lon · haversine miles · nearest-neighbor + 2-opt heuristic

Path	Solver	Solver exact?	Valid?	Direct gap	Orchestrated target	Solver time
Fixture baseline	NN + 2-opt	heuristic	yes	1,491.9 mi	solver	0.013s

National-scale rung. The deterministic path is still the professional answer, but it is no longer exact. The orchestrated path matches the deterministic solver output. The direct path remains a benchmarked baseline, not the solving authority.

100-City World Comparison

Fixture: tsp-008 · world_100 · 100 cities · real lat/lon · haversine miles · nearest-neighbor + 2-opt heuristic

Local Model Slice

Model	Direct valid?	Direct outcome	Orchestrated target	Direct time	Orchestrated time
`gemma3:27b`	no	missing 9 cities	solver	21.886s	21.285s
`qwen2.5:14b`	no	severe missing-city collapse	solver	9.255s	9.924s
`llama3.1:8b`	no	severe missing-city collapse	solver	7.821s	4.540s
`llama3.2:3b`	no	severe missing-city collapse	solver	5.276s	2.978s

The local-model set does not merely drift at world scale — it collapses structurally across all tested rows. The orchestrated path still matched the deterministic solver on every row.

Frontier Nuance

Two bounded frontier-anchor assessments show a different failure mode at the world rung. These are not clean shared-table benchmark rows — conditions are not identical to the local Ollama runs.

Instance	Cities	`Codex Frontier`	`Claude Frontier`	Orch correct?
`tsp-004`	8	gap 0.0	gap 0.0	yes
`tsp-005`	10	gap 0.0	gap 0.0	yes
`tsp-006`	20	gap 0.0	gap 0.0 (contaminated)	yes
`tsp-007`	50	not attempted	gap 1,484.2 mi (12.8%)	yes
`tsp-008`	100	8,761.4 mi vs heuristic	4,996.3 mi (7.5%)	yes

Frontier models stay structurally valid far beyond the local set. Stronger frontier capability delays failure into quality drift rather than structural collapse. Even there, the solver-backed orchestrated path remains the authoritative answer.

What Changed At Scale

The Phoenix point survives the size increase, but the solver contract becomes more explicit:

Small (4–10 cities)

Brute-force exact search. The solver produces the optimal route. Direct model solving varies from exact to structurally invalid by model.

Medium (20 cities)

Held-Karp dynamic programming — still exact. Direct models stay valid but are worse than the exact answer.

Large (50 cities)

Deterministic heuristic. The solver is no longer globally optimal, but remains the professional optimization authority.

World (100 cities)

Deterministic heuristic. Local models collapse structurally. Frontier models drift in quality. The orchestrated path holds.

The important systems lesson stays the same: correctness or optimization policy belongs in the solver path. The model should classify, route, and explain.

Failure Mode Shift

At world scale, the failure split becomes especially clear:

Local models

structural collapse — Drop cities, produce invalid tours. The route itself is unusable.

Frontier models

quality drift — Remain structurally valid but diverge materially from the deterministic heuristic baseline.

The architectural conclusion survives both regimes. Stronger models delay failure — they do not eliminate the need for solver-backed design.

ject-phoenix/">Project Phoenix Overview · All Papers