OpenClaw x Project Phoenix

OpenClaw Demo

OpenClaw acts as the monitored control shell. Project Phoenix provides deterministic authority underneath.

OpenClaw logo

Core idea: OpenClaw makes Project Phoenix accessible. Project Phoenix makes OpenClaw outputs trustworthy.

Architecture

OpenClaw
operator shell
proxy / monitoring
ingress layer
ShowcaseAgent
internal routing
presentation layer
benchmark surface
Phoenix Backends
deterministic scripts
domains / tools
authority layer

OpenClaw

Outer operator shell, proxy surface, and monitoring layer. It handles ingress and presentation.

ShowcaseAgent

Internal Phoenix routing and presentation layer. It remains the benchmarkable domain-compression surface.

Deterministic Phoenix Backends

Authoritative scripts and domains underneath. Correctness lives here rather than in raw model generation.

Why This Path

The local ollama-local OpenClaw profile points at ollama/gemma3:27b, but that model does not support OpenClaw’s default tool-enabled local-agent path. Rather than forcing that boundary, this demo uses the Project Phoenix pattern that already works elsewhere:

Deterministic wrapper

Small shell wrappers expose deterministic Phoenix backends directly.

Inspectable output

Each command returns either JSON or a concise human-readable operator summary.

Monitoring-friendly

OpenClaw can sit in front as the monitored shell without becoming the source of truth.

Phoenix Ops Shell

Four deterministic backends, each a direct script invocation with no model in the result path:

Backend Endpoint Why it matters
TSP summary /phoenix-tsp Best lead demo. Cleanest external hook and clearest Phoenix conclusion.
Gemma protocol verdict /phoenix-gemma Shows that operational trust and raw capability are different axes.
Run-trace summary /phoenix-run-trace Shows the internal debugging and failure-compression layer.
Benchmark results /phoenix-benchmark-summary PlayerAgent V3 100-task results — generation quality from domain structure, not raw model capability.

Three aggregate operator surfaces sit on top:

/phoenix-ops-summary

Fans out to all four backends in parallel. Degrades gracefully if any backend fails — healthy backends continue to serve.

/phoenix-ops-status

Incident mode at the top (ATTENTION NEEDED or STATUS: healthy), per-backend health table (healthy / degraded / down), aggregate stats, and recent invocation log.

/phoenix-ops-trends

Rolling history of summary calls — latency, backend health counts, and ok/fail state per call. Each summary invocation writes one snapshot row.

TSP Lead Demo

Important nuance: gemma4:26b is clearly the strongest local direct TSP model in the current slice. It clears the Arizona ladder and degrades much less severely at 100 cities than the rest of the local set. That strengthens the Phoenix point rather than weakening it: stronger models help, but the solver-backed path still remains the authoritative answer.

Project Phoenix TSP summary

Arizona (10 cities, tsp-005):
- gemma4:26b: exact tie
- gemma3:27b: gap 5.432581
- orchestrated path: yes

World (100 cities, tsp-008):
- gemma4:26b: missing 2 cities (Hanoi, Busan)
- gemma3:27b: missing 9 cities
- orchestrated target: solver

gemma4:26b takeaway: gemma4:26b is the strongest local direct TSP model in the current slice: exact across Arizona and materially less degraded at 100 cities, but still not authoritative enough to replace the solver-backed path.
Bottom line: Every tested local model failed direct route validity at 100 cities, while the orchestrated path still matched the deterministic solver output.
Phoenix conclusion: correctness should live in the solver, not in the model
Secondary line: stronger models delay failure; they do not eliminate the need for solver-backed architecture

Gemma Protocol Follow-Up

Project Phoenix Gemma protocol summary

Question:
- Is Gemma 4 operationally safer than Gemma 3 on strict machine-facing protocol tasks?

Gemma 3:
- model: gemma3:27b
- desktop: 3/6, 5/6, 5/6
- z13: 2/6, 5/6, 5/6
- reading: more operationally trustworthy on the protocol lane

Gemma 4:
- model: gemma4:26b
- desktop first pass: 0/6, 0/6, 0/6
- note: all six protocol probes landed as non_json
- current read: partially prompt-fixable for simple schemas, still unreliable for complex nested or array-shaped machine-facing outputs

Deployment verdict: Gemma 4 is not a strict-protocol model in the current release/configuration.
Phoenix conclusion: raw capability and operational trust are different axes

Run-Trace Follow-Up

Project Phoenix run-trace summary

Question: What is the current role of the Phoenix run-trace layer?

Current trace lines:
- ShowcaseAgent routing traces exist and have already grouped recurring routing misses into replayable failure families.
- TourAgent protocol traces exist across strict protocol, pipeline, and handoff lanes.
- The run-trace layer is currently narrow and useful: it compresses failures into clearer replay or regression targets.

Short answer: It is a narrow debugging and failure-compression layer, not a general telemetry platform.
Phoenix conclusion: use run traces when they make the next action clearer; keep them narrow otherwise

Presentation Order

1. TSP

Lead with the clearest external hook and the strongest simple architectural lesson.

2. Gemma Protocol

Show that Project Phoenix evaluates operational trust, not just reasoning flair.

3. Run Traces

Close with the internal systems layer: replay, failure compression, and disciplined debugging.