LocalLLMTSP

Fixture instances

Local models tested

Geographic scale rungs

100%

Orchestrated path accuracy

Central conclusion: correctness should live in the solver, not in the model. Stronger models delay failure — they do not eliminate the need for solver-backed architecture.

Abstract

LocalLLMTSP was built to answer a systems question, not a benchmark question: in a route-optimization workflow, should the LLM be the optimizer, or should it operate around a deterministic solver as the orchestration and explanation layer?

The traveling salesman problem is useful here because it makes the role split visible quickly. Direct model solving improves as models get stronger, but that does not settle where correctness should live. Across a fixed ladder of Arizona, Southwest, U.S., and world-scale fixtures, Project Phoenix keeps arriving at the same design answer: let the solver own the optimization, and let the model interpret, route, and explain.

The central result is straightforward. On bounded local-model slices, direct solving ranges from exact to structurally invalid at small scales and collapses entirely by the 100-city world rung. Frontier models push that boundary much farther out and can remain structurally valid at world scale, but they still drift materially from the deterministic solver. The orchestrated path remains stable across the entire ladder.

The Central Question

Many discussions of LLM capability ask whether a model can solve a problem directly. That framing is often too narrow for practical systems work. The more useful engineering question is where the model should sit inside the system.

This domain compares two roles:

Direct model solving

Ask the LLM to produce the route. Correctness depends entirely on generation quality. Varies by model and workload size.

Model-guided orchestration

The LLM interprets the task, routes to the correct solver, and explains the result. Correctness lives in the deterministic solver.

Project Phoenix generally favors the second pattern. LocalLLMTSP was built to test that preference on a clean, legible problem.

Method

The comparison uses fixed TSP fixtures and holds the workload constant while changing the model or the scale rung. The fixture ladder is geographic:

Scale rung	Instance	Cities	Solver	Solver exact?
Arizona	`tsp-001` – `tsp-005`	4 – 10	Brute-force	yes
Southwest	`tsp-006`	20	Held-Karp DP	yes
United States	`tsp-007`	50	Nearest-neighbor + 2-opt	heuristic
World	`tsp-008`	100	Nearest-neighbor + 2-opt	heuristic

The solver contract changes honestly with scale. The claim is not that the solver is always exact — the claim is that it remains the authoritative optimization path at each rung.

Phoenix Architecture Principles

Solver owns correctness

The deterministic algorithm carries optimization policy. Not the model.

Model interprets the task

Classify the request, select the appropriate algorithm, frame the parameters.

Model routes to the solver

Deterministic dispatch — not improvised generation — produces the route.

Model explains the result

Readable output, context, and caveats come from the LLM layer after the solver returns.

Scope Boundary

This site does not claim:

That these four models define the whole local-model space
That the current fixture set is a universal benchmark
That the frontier case study rows are comparable to the local Ollama rows

It does claim:

The role-assignment lesson is visible across the full ladder
The orchestrated path remained correct on every tested fixture and every tested model
Stronger models delay failure — they do not eliminate the need for solver-backed architecture

Abstract

The Central Question

Direct model solving

Model-guided orchestration

Method

Phoenix Architecture Principles

Solver owns correctness

Model interprets the task

Model routes to the solver

Model explains the result

What This Site Documents

Local Models →

Scale Ladder →

Frontier →

OpenClaw Demo →

Scope Boundary