Project Phoenix · Arizona TSP

LocalLLMTSP

Should the LLM be the optimizer, or the orchestration layer around a deterministic solver?

8
Fixture instances
4
Local models tested
4
Geographic scale rungs
100%
Orchestrated path accuracy

Central conclusion: correctness should live in the solver, not in the model. Stronger models delay failure — they do not eliminate the need for solver-backed architecture.

Abstract

LocalLLMTSP was built to answer a systems question, not a benchmark question: in a route-optimization workflow, should the LLM be the optimizer, or should it operate around a deterministic solver as the orchestration and explanation layer?

The traveling salesman problem is useful here because it makes the role split visible quickly. Direct model solving improves as models get stronger, but that does not settle where correctness should live. Across a fixed ladder of Arizona, Southwest, U.S., and world-scale fixtures, Project Phoenix keeps arriving at the same design answer: let the solver own the optimization, and let the model interpret, route, and explain.

The central result is straightforward. On bounded local-model slices, direct solving ranges from exact to structurally invalid at small scales and collapses entirely by the 100-city world rung. Frontier models push that boundary much farther out and can remain structurally valid at world scale, but they still drift materially from the deterministic solver. The orchestrated path remains stable across the entire ladder.

The Central Question

Many discussions of LLM capability ask whether a model can solve a problem directly. That framing is often too narrow for practical systems work. The more useful engineering question is where the model should sit inside the system.

This domain compares two roles:

Direct model solving

Ask the LLM to produce the route. Correctness depends entirely on generation quality. Varies by model and workload size.

Model-guided orchestration

The LLM interprets the task, routes to the correct solver, and explains the result. Correctness lives in the deterministic solver.

Project Phoenix generally favors the second pattern. LocalLLMTSP was built to test that preference on a clean, legible problem.

Method

The comparison uses fixed TSP fixtures and holds the workload constant while changing the model or the scale rung. The fixture ladder is geographic:

Scale rung Instance Cities Solver Solver exact?
Arizona tsp-001tsp-005 4 – 10 Brute-force yes
Southwest tsp-006 20 Held-Karp DP yes
United States tsp-007 50 Nearest-neighbor + 2-opt heuristic
World tsp-008 100 Nearest-neighbor + 2-opt heuristic

The solver contract changes honestly with scale. The claim is not that the solver is always exact — the claim is that it remains the authoritative optimization path at each rung.

Phoenix Architecture Principles

Solver owns correctness

The deterministic algorithm carries optimization policy. Not the model.

Model interprets the task

Classify the request, select the appropriate algorithm, frame the parameters.

Model routes to the solver

Deterministic dispatch — not improvised generation — produces the route.

Model explains the result

Readable output, context, and caveats come from the LLM layer after the solver returns.

What This Site Documents

Local Models →

Arizona ladder results. 8-city and 10-city comparisons across 4 local models with direct vs. orchestrated outcome tables.

Scale Ladder →

Geographic expansion: 20-city Southwest, 50-city U.S., 100-city World. Local model collapse at world scale.

Frontier →

Codex Frontier vs. Claude Frontier bounded case study. Different failure mode, same architectural conclusion.

OpenClaw Demo →

OpenClaw as the monitored shell, Project Phoenix as the deterministic authority. Three operator-facing commands.

Scope Boundary

This site does not claim:

It does claim: