OpenClaw
Outer operator shell, proxy surface, and monitoring layer. It handles ingress and presentation.
OpenClaw acts as the monitored control shell. Project Phoenix provides deterministic authority underneath.
Core idea: OpenClaw makes Project Phoenix accessible. Project Phoenix makes OpenClaw outputs trustworthy.
Outer operator shell, proxy surface, and monitoring layer. It handles ingress and presentation.
Internal Phoenix routing and presentation layer. It remains the benchmarkable domain-compression surface.
Authoritative scripts and domains underneath. Correctness lives here rather than in raw model generation.
The local ollama-local OpenClaw profile points at ollama/gemma3:27b, but that model does not support OpenClaw’s default tool-enabled local-agent path. Rather than forcing that boundary, this demo uses the Project Phoenix pattern that already works elsewhere:
Small shell wrappers expose deterministic Phoenix backends directly.
Each command returns either JSON or a concise human-readable operator summary.
OpenClaw can sit in front as the monitored shell without becoming the source of truth.
Four deterministic backends, each a direct script invocation with no model in the result path:
| Backend | Endpoint | Why it matters |
|---|---|---|
| TSP summary | /phoenix-tsp |
Best lead demo. Cleanest external hook and clearest Phoenix conclusion. |
| Gemma protocol verdict | /phoenix-gemma |
Shows that operational trust and raw capability are different axes. |
| Run-trace summary | /phoenix-run-trace |
Shows the internal debugging and failure-compression layer. |
| Benchmark results | /phoenix-benchmark-summary |
PlayerAgent V3 100-task results — generation quality from domain structure, not raw model capability. |
Three aggregate operator surfaces sit on top:
Fans out to all four backends in parallel. Degrades gracefully if any backend fails — healthy backends continue to serve.
Incident mode at the top (ATTENTION NEEDED or STATUS: healthy), per-backend health table (healthy / degraded / down), aggregate stats, and recent invocation log.
Rolling history of summary calls — latency, backend health counts, and ok/fail state per call. Each summary invocation writes one snapshot row.
Important nuance: gemma4:26b is clearly the strongest local direct TSP model in the current slice. It clears the Arizona ladder and degrades much less severely at 100 cities than the rest of the local set. That strengthens the Phoenix point rather than weakening it: stronger models help, but the solver-backed path still remains the authoritative answer.
Project Phoenix TSP summary Arizona (10 cities, tsp-005): - gemma4:26b: exact tie - gemma3:27b: gap 5.432581 - orchestrated path: yes World (100 cities, tsp-008): - gemma4:26b: missing 2 cities (Hanoi, Busan) - gemma3:27b: missing 9 cities - orchestrated target: solver gemma4:26b takeaway: gemma4:26b is the strongest local direct TSP model in the current slice: exact across Arizona and materially less degraded at 100 cities, but still not authoritative enough to replace the solver-backed path. Bottom line: Every tested local model failed direct route validity at 100 cities, while the orchestrated path still matched the deterministic solver output. Phoenix conclusion: correctness should live in the solver, not in the model Secondary line: stronger models delay failure; they do not eliminate the need for solver-backed architecture
Project Phoenix Gemma protocol summary Question: - Is Gemma 4 operationally safer than Gemma 3 on strict machine-facing protocol tasks? Gemma 3: - model: gemma3:27b - desktop: 3/6, 5/6, 5/6 - z13: 2/6, 5/6, 5/6 - reading: more operationally trustworthy on the protocol lane Gemma 4: - model: gemma4:26b - desktop first pass: 0/6, 0/6, 0/6 - note: all six protocol probes landed as non_json - current read: partially prompt-fixable for simple schemas, still unreliable for complex nested or array-shaped machine-facing outputs Deployment verdict: Gemma 4 is not a strict-protocol model in the current release/configuration. Phoenix conclusion: raw capability and operational trust are different axes
Project Phoenix run-trace summary Question: What is the current role of the Phoenix run-trace layer? Current trace lines: - ShowcaseAgent routing traces exist and have already grouped recurring routing misses into replayable failure families. - TourAgent protocol traces exist across strict protocol, pipeline, and handoff lanes. - The run-trace layer is currently narrow and useful: it compresses failures into clearer replay or regression targets. Short answer: It is a narrow debugging and failure-compression layer, not a general telemetry platform. Phoenix conclusion: use run traces when they make the next action clearer; keep them narrow otherwise
Lead with the clearest external hook and the strongest simple architectural lesson.
Show that Project Phoenix evaluates operational trust, not just reasoning flair.
Close with the internal systems layer: replay, failure compression, and disciplined debugging.