Project Phoenix · Research Portfolio

Project Phoenix Papers

Nineteen primary papers on grounded domain systems, orchestration architecture, agentic operating discipline, ML evaluation benchmarks, operator infrastructure, measurement integrity, and applied production evidence

19 Primary Papers 12 Live Sites 9 Research Clusters
2026 Active

The finding: For grounded domain tasks — well-defined task classes with deterministic substrates — harness configuration is the binding constraint. Model identity is not. These papers prove this claim under stress test conditions: local models, which cannot compensate for a weak harness, converge with frontier models at the semantic usefulness level when the harness is sufficient. This scope is deliberate. Outside it, model capability matters in ways the framework does not cover.

Local Model Addendum: the Project Phoenix Local Model Details page covers the three supporting papers that feed the orchestration synthesis: TourAgent (1.13), ShowcaseAgent (1.12), and Local Model Role Suitability (1.11).

Boundary Results: the Project Phoenix Boundary Results page covers three papers that map where the organized stack hits its limits: Grounded Agent Failure Is Structurally Determined (1.10), True Ski Chalet Boundary Result (1.14), and When The Organized Stack Loses (1.15).

RVH / ML Evaluation: Rough Volatility as ML Benchmark covers Papers 1.8 and 1.9 — why domain expertise, not ML capability, is the binding constraint in rough volatility forecasting and the cross-domain benchmark principle it reveals.

Measurement Integrity, Operator Layer & Applied Evidence: Papers 1.16–1.19 extend the framework outward. Paper 1.16 shows that evaluation infrastructure can fail at the capture boundary — a VT100 terminal artifact was corrupting protocol scores for thinking-mode models. Paper 1.17 documents the operator shell pattern: how OpenClaw wraps Project Phoenix as an access layer without becoming the authority. Paper 1.18 is the framework's first numbered production case — PPR Agent, 92M regulated cardiac device implants across 18 years, behind a deterministic SQLite substrate. Paper 1.19 is a short companion to 1.16 on the other side of the apparatus: when stronger models override literal substrate inspection, capability itself becomes a source of non-neutrality.

Where to Start

Each paper stands alone. Use the cluster that matches your interest:

New to Project Phoenix

Start with Paper 1.1 for the framework framing, then try the TourAgent live demo — ten tennis questions with repeatable answers — to see the deterministic approach in action.

Local Inference & Offline Systems

Papers 1.2, 1.3, 1.5 form a cluster: offline grounded agent → ski chalet hardware boundary → TSP solver-backed orchestration. The common argument: harness level, not model size, drives usefulness.

Orchestration & Role Assignment

Papers 1.5, 1.6, 1.11, 1.12, 1.13 address where correctness should live and how grounding, routing, and repair beat raw power in identifiable regimes.

Failure Modes & Boundary Conditions

Papers 1.7, 1.10, 1.14, 1.15 cover the failure taxonomy, empirical failure prediction, the true local ceiling, and the five conditions under which the organized stack's advantage collapses.

ML Evaluation & Cross-Domain Benchmarks

Papers 1.8 and 1.9 establish why realized volatility forecasting is high-signal benchmark territory — and what the same structural argument implies across semiconductor defectivity and other rough-process domains.

Measurement Integrity & Operator Layer

Papers 1.16, 1.17, and 1.19 address the infrastructure surrounding the Phoenix system. 1.16: capture pipeline failures produce false evaluation verdicts. 1.17: an operator shell can expose the deterministic stack without replacing it as the authority. 1.19: when stronger models override literal substrate inspection, the model itself becomes part of the non-neutrality.

Applied / Production Evidence

Paper 1.18 is the first numbered production case — PPR Agent running against government-mandated cardiac device data for 18 years. This is field validation, not lane validation — the framework operating against regulated disclosures from three manufacturers.

I  ·  Grounding, Local Systems & Hardware

What makes a local or offline system actually useful — and what the evidence honestly supports.

II  ·  Orchestration & Role Assignment

Where correctness should live in an AI system — and what happens when it lives in the wrong place.

III  ·  Framework & Operating Discipline

The standards, supervision structures, and failure taxonomy that make agentic work trustworthy.

IV  ·  Local Model Addendum

Three empirical papers feeding the orchestration synthesis — grounded reliability, routing, and role suitability at portfolio scale.

V  ·  Boundary Conditions & Failure Prediction

Where the organized stack's advantage collapses — and why failure family is predictable from configuration, not query content.

VI  ·  RVH / ML Evaluation

Realized volatility forecasting as high-signal ML benchmark territory — and the cross-domain principle it reveals.

VII  ·  Measurement Integrity

When the evaluation infrastructure itself fails — or when the model's own disposition toward the substrate becomes part of the apparatus.

VIII  ·  Operator Layer

Building an operator-facing outer layer over the deterministic stack — and keeping it outside the authority boundary.

IX  ·  Applied / Production Evidence

Field validation, not lane validation — the framework operating against regulated data in a real domain.

Full Inventory

# Title Track Site
Primary Papers — 1.1 through 1.7
1.1 Project Phoenix — Open-Core Standards Framework project-phoenix/
1.2 Offline Grounded Domain Agent Grounding offline-agent/
1.3 Ski Chalet Harness Boundary Grounding ski-chalet/
1.4 Fab Simulation & RVH Grounding fab-rvh/
1.5 LocalLLMTSP — Solver-Backed Orchestration Orchestration local-llm-tsp/
1.6 Where Orchestration Beats Raw Model Power Orchestration orchestration/
1.7 Agentic Coding Failure Patterns Operations agentic-coding/
RVH — 1.8 and 1.9
1.8 Rough Volatility — Cross-Domain Benchmark Principle RVH / ML Eval rough-volatility/
1.9 Rough Volatility — ML Evaluation Domain RVH / ML Eval rough-volatility/
Boundary & Details — 1.10 through 1.15
1.10 Grounded Agent Failure Is Structurally Determined Boundary failure-details/
1.11 Local Model Role Suitability Local Model local-model-role-suitability/
1.12 ShowcaseAgent Routing And Compression Local Model details/
1.13 TourAgent Local Model Screen Local Model details/
1.14 True Ski Chalet Boundary Result Boundary failure-details/
1.15 When The Organized Stack Loses Boundary failure-details/
Measurement Integrity — 1.16 and 1.19
1.16 The Model Did Not Fail the Protocol. The Terminal Did. Measurement capture-integrity/
1.19 Literal Substrate Inspection — When Stronger Models Override the Evidence Measurement capture-integrity/
Operator Layer — 1.17
1.17 The Operator Shell Pattern Operator Layer operator-shell/
Applied / Production Evidence — 1.18
1.18 PPR Agent — A Deterministic Substrate for Auditable Medical-Device Intelligence Applied ppr-agent/

All sites live at proto.efehnconsulting.com. Papers 1.8–1.9 share the rough-volatility site; 1.12–1.13 share the details site; 1.10/1.14/1.15 share the failure-details site; 1.16 and 1.19 share the capture-integrity site. Paper 1.17 has a dedicated site at operator-shell/. Paper 1.18 has a dedicated site at ppr-agent/.