LocalLLMTSP | Local Model Results

Compact Arizona Summary

Instance	Cities	gemma4:26b	gemma4:e4b	gemma3:27b	qwen2.5:14b	llama3.1:8b	llama3.2:3b	Orch correct?
tsp-003	6	exact tie	exact tie	gap 4.19	—	—	—	yes
tsp-004	8	exact tie	gap 2.11	exact tie	duplicate city	missing city	duplicate city	yes
tsp-005	10	exact tie	gap 3.82	gap 5.43	gap 18.42	gap 20.85	duplicate Phoenix	yes

gemma4:26b is the first local model to achieve exact ties across the full Arizona ladder. gemma4:e4b cracked the greedy trap (tsp-003) that gemma3:27b could not. The orchestrated path remained correct across every model and every instance.

Gemma 4 Results

Two gemma4 MoE models were added to the ladder — both run on an RTX 3090 via Ollama with default quantization.

gemma4:e4b — Arizona Ladder

11 GB loaded · ~4B active parameters

Instance	Cities	Direct valid?	Direct gap	Orch exact?	Direct time	Orch time
tsp-001	4	yes	exact tie	yes	15.210s	10.038s
tsp-002	5	yes	gap 1.769079	yes	28.597s	9.829s
tsp-003	6	yes	exact tie	yes	29.426s	10.002s
tsp-004	8	yes	gap 2.10799	yes	54.150s	6.356s
tsp-005	10	yes	gap 3.819021	yes	32.654s	10.292s

Zero structural failures. Cracked the greedy trap (tsp-003) exactly — gemma3:27b had a gap of 4.185 there. Mixed quality elsewhere vs gemma3:27b. Collapsed at world-100 with 9 missing cities, same count as gemma3:27b.

gemma4:26b — Arizona Ladder

17 GB loaded · ~26B total / 4B active (larger gate) · first local model to achieve 5/5 exact ties

Instance	Cities	Direct valid?	Direct gap	Orch exact?	Direct time	Orch time
tsp-001	4	yes	exact tie	yes	7.059s	10.172s
tsp-002	5	yes	exact tie	yes	100.852s	10.384s
tsp-003	6	yes	exact tie	yes	123.816s	11.315s
tsp-004	8	yes	exact tie	yes	181.311s	10.575s
tsp-005	10	yes	exact tie	yes	292.824s	12.430s

Perfect direct quality across all five Arizona fixtures — the first local model to achieve this. The cost: direct solve time scales from 7s (4 cities) to 293s (10 cities). At world-100, it dropped only 2 cities (Hanoi, Busan) — the closest any local model has come to structural validity at that scale. The orchestrated path stays flat at 10–16s regardless of instance size.

8-City Arizona Comparison

Fixture: tsp-004 · az_large · 8 cities · brute-force exact solver

Model	Direct valid?	Direct outcome	Orch exact?	Direct time	Orch time
gemma3:27b	yes	exact tie	yes	20.394s	14.926s
qwen2.5:14b	no	duplicate-city route	yes	28.359s	5.852s
llama3.1:8b	no	missing-city route	yes	8.998s	3.474s
llama3.2:3b	no	duplicate-city route	yes	5.918s	1.843s

At 8 cities, direct local-model solving already varies from exact to structurally invalid. The orchestrated path stayed exact across all tested models.

10-City Arizona Comparison

Fixture: tsp-005 · az_large · 10 cities · brute-force exact solver

Model	Direct valid?	Direct outcome	Orch exact?	Direct time	Orch time
gemma3:27b	yes	gap 5.432581	yes	25.065s	12.990s
qwen2.5:14b	yes	gap 18.416634	yes	14.665s	8.215s
llama3.1:8b	yes	gap 20.846102	yes	9.132s	4.806s
llama3.2:3b	no	duplicate Phoenix	yes	5.636s	3.244s

At 10 cities, even the stronger direct models remain suboptimal. The weakest model becomes structurally invalid. The orchestrated path stayed exact across all tested models.

gemma3:27b — Full Arizona Fixture Ladder

The stronger direct model held across every Arizona case — but still drifted from optimal on harder instances.

Arizona case	Instance	Cities	Direct valid?	Direct outcome	Orch exact?
Easy perimeter	tsp-001	4	yes	exact tie	yes
Clustered	tsp-002	5	yes	gap 0.724103	yes
Greedy trap	tsp-003	6	yes	gap 4.185364	yes
Larger fixture	tsp-004	8	yes	exact tie	yes
Larger fixture	tsp-005	10	yes	gap 5.432581	yes

The direct path can stay structurally valid while still drifting away from optimal. The orchestrated path stayed exact across the whole fixture ladder.

llama3.2:3b — Full Arizona Fixture Ladder

The smallest tested model shows the opposite edge: structural failure begins early once the fixture becomes even slightly more demanding.

Arizona case	Instance	Cities	Direct valid?	Direct outcome	Orch exact?
Easy perimeter	tsp-001	4	yes	exact tie	yes
Clustered	tsp-002	5	no	missing-city route	yes
Greedy trap	tsp-003	6	no	missing-city route	yes
Larger fixture	tsp-004	8	no	duplicate Phoenix	yes
Larger fixture	tsp-005	10	no	duplicate Phoenix	yes

The smallest model can still succeed on the very smallest case. The orchestrated path stayed exact across the whole fixture ladder regardless.

Model-by-Model Reading

gemma3:27b

Strongest direct model in the Arizona slice. Still not a reliable reason to make the model the solver — even a best-case result carries workload sensitivity.

qwen2.5:14b

Can stay structurally valid while still drifting far from the optimal route. Structural validity and solution quality are separate concerns.

llama3.1:8b

Stayed structurally valid on the 10-city slice but direct quality degraded materially as the workload grew.

llama3.2:3b

Fast, but becomes structurally invalid on both fixed comparison slices. Speed advantage does not offset correctness failure.

Phoenix Point

When the model is asked to directly solve the optimization problem, quality varies materially by model and workload.

When the model is used to interpret, route, and explain around a deterministic solver, correctness stays stable.

One-sentence version: the LLM is useful — just not as the solver.