Direct model solving
Ask the LLM to produce the route. Correctness depends entirely on generation quality. Varies by model and workload size.
Should the LLM be the optimizer, or the orchestration layer around a deterministic solver?
Central conclusion: correctness should live in the solver, not in the model. Stronger models delay failure — they do not eliminate the need for solver-backed architecture.
LocalLLMTSP was built to answer a systems question, not a benchmark question: in a route-optimization workflow, should the LLM be the optimizer, or should it operate around a deterministic solver as the orchestration and explanation layer?
The traveling salesman problem is useful here because it makes the role split visible quickly. Direct model solving improves as models get stronger, but that does not settle where correctness should live. Across a fixed ladder of Arizona, Southwest, U.S., and world-scale fixtures, Project Phoenix keeps arriving at the same design answer: let the solver own the optimization, and let the model interpret, route, and explain.
The central result is straightforward. On bounded local-model slices, direct solving ranges from exact to structurally invalid at small scales and collapses entirely by the 100-city world rung. Frontier models push that boundary much farther out and can remain structurally valid at world scale, but they still drift materially from the deterministic solver. The orchestrated path remains stable across the entire ladder.
Many discussions of LLM capability ask whether a model can solve a problem directly. That framing is often too narrow for practical systems work. The more useful engineering question is where the model should sit inside the system.
This domain compares two roles:
Ask the LLM to produce the route. Correctness depends entirely on generation quality. Varies by model and workload size.
The LLM interprets the task, routes to the correct solver, and explains the result. Correctness lives in the deterministic solver.
Project Phoenix generally favors the second pattern. LocalLLMTSP was built to test that preference on a clean, legible problem.
The comparison uses fixed TSP fixtures and holds the workload constant while changing the model or the scale rung. The fixture ladder is geographic:
| Scale rung | Instance | Cities | Solver | Solver exact? |
|---|---|---|---|---|
| Arizona | tsp-001 – tsp-005 |
4 – 10 | Brute-force | yes |
| Southwest | tsp-006 |
20 | Held-Karp DP | yes |
| United States | tsp-007 |
50 | Nearest-neighbor + 2-opt | heuristic |
| World | tsp-008 |
100 | Nearest-neighbor + 2-opt | heuristic |
The solver contract changes honestly with scale. The claim is not that the solver is always exact — the claim is that it remains the authoritative optimization path at each rung.
The deterministic algorithm carries optimization policy. Not the model.
Classify the request, select the appropriate algorithm, frame the parameters.
Deterministic dispatch — not improvised generation — produces the route.
Readable output, context, and caveats come from the LLM layer after the solver returns.
Arizona ladder results. 8-city and 10-city comparisons across 4 local models with direct vs. orchestrated outcome tables.
Geographic expansion: 20-city Southwest, 50-city U.S., 100-city World. Local model collapse at world scale.
Codex Frontier vs. Claude Frontier bounded case study. Different failure mode, same architectural conclusion.
OpenClaw as the monitored shell, Project Phoenix as the deterministic authority. Three operator-facing commands.
This site does not claim:
It does claim: