Geographic Scale Ladder

Scale

From Arizona to the world: how the fixture ladder ends, and where local models collapse.

The point is not that the model suddenly becomes the solver at larger scales. The point is that the deterministic solver path changes honestly as the problem size changes — and the orchestrated path remains authoritative regardless.

Compact Ladder View — Local Models

Instance Cities Solver Local direct valid? Local outcome Orchestrated correct?
tsp-004/005 8 – 10 Brute-force exact model-dependent Exact to structurally invalid yes
tsp-006 20 Held-Karp exact DP yes Valid but worse than exact solver yes
tsp-007 50 NN + 2-opt heuristic yes Valid but worse than heuristic yes
tsp-008 100 NN + 2-opt heuristic no Structural collapse — all tested local models yes

Local models can remain usable into the geographic middle rung. They do not remain structurally reliable at world scale. The orchestrated path continues to track the deterministic solver at every rung.

20-City Southwest Geographic Comparison

Fixture: tsp-006 · sw_20 · 20 cities · real lat/lon · haversine miles · Held-Karp exact DP

Path Solver Solver exact? Valid? Direct gap Orchestrated target Solver time
Fixture baseline Held-Karp yes yes 703.5 mi optimal 9.704s

First real-geographic medium-scale rung. The deterministic path is still exact here. The direct path stays structurally valid but is materially worse than the exact solver. The orchestrated path matches the exact deterministic answer.

50-City U.S. Geographic Comparison

Fixture: tsp-007 · us_50 · 50 cities · real lat/lon · haversine miles · nearest-neighbor + 2-opt heuristic

Path Solver Solver exact? Valid? Direct gap Orchestrated target Solver time
Fixture baseline NN + 2-opt heuristic yes 1,491.9 mi solver 0.013s

National-scale rung. The deterministic path is still the professional answer, but it is no longer exact. The orchestrated path matches the deterministic solver output. The direct path remains a benchmarked baseline, not the solving authority.

100-City World Comparison

Fixture: tsp-008 · world_100 · 100 cities · real lat/lon · haversine miles · nearest-neighbor + 2-opt heuristic

Local Model Slice

Model Direct valid? Direct outcome Orchestrated target Direct time Orchestrated time
gemma3:27b no missing 9 cities solver 21.886s 21.285s
qwen2.5:14b no severe missing-city collapse solver 9.255s 9.924s
llama3.1:8b no severe missing-city collapse solver 7.821s 4.540s
llama3.2:3b no severe missing-city collapse solver 5.276s 2.978s

The local-model set does not merely drift at world scale — it collapses structurally across all tested rows. The orchestrated path still matched the deterministic solver on every row.


Frontier Nuance

Two bounded frontier-anchor assessments show a different failure mode at the world rung. These are not clean shared-table benchmark rows — conditions are not identical to the local Ollama runs.

Instance Cities Codex Frontier Claude Frontier Orch correct?
tsp-004 8 gap 0.0 gap 0.0 yes
tsp-005 10 gap 0.0 gap 0.0 yes
tsp-006 20 gap 0.0 gap 0.0 (contaminated) yes
tsp-007 50 not attempted gap 1,484.2 mi (12.8%) yes
tsp-008 100 8,761.4 mi vs heuristic 4,996.3 mi (7.5%) yes

Frontier models stay structurally valid far beyond the local set. Stronger frontier capability delays failure into quality drift rather than structural collapse. Even there, the solver-backed orchestrated path remains the authoritative answer.

What Changed At Scale

The Phoenix point survives the size increase, but the solver contract becomes more explicit:

Small (4–10 cities)

Brute-force exact search. The solver produces the optimal route. Direct model solving varies from exact to structurally invalid by model.

Medium (20 cities)

Held-Karp dynamic programming — still exact. Direct models stay valid but are worse than the exact answer.

Large (50 cities)

Deterministic heuristic. The solver is no longer globally optimal, but remains the professional optimization authority.

World (100 cities)

Deterministic heuristic. Local models collapse structurally. Frontier models drift in quality. The orchestrated path holds.

The important systems lesson stays the same: correctness or optimization policy belongs in the solver path. The model should classify, route, and explain.

Failure Mode Shift

At world scale, the failure split becomes especially clear:

Local models

structural collapse — Drop cities, produce invalid tours. The route itself is unusable.

Frontier models

quality drift — Remain structurally valid but diverge materially from the deterministic heuristic baseline.

The architectural conclusion survives both regimes. Stronger models delay failure — they do not eliminate the need for solver-backed design.