The Harness | Ski Chalet · Project Phoenix

The Four Harness Layers

The working harness is a layered local system. Each layer carries a distinct function. All four are needed for the prepared-chalet scenario to hold.

1

Deterministic Substrate

The ground-truth layer. Structured datasets, databases, and reference artifacts that the model is grounded against. This is what makes answers checkable. Without a deterministic substrate, the model is just generating — there is no local authority to anchor its outputs against.

At the chalet: the portable bundle carries the domain databases, question banks, and reference answer sheets that constitute the substrate.

2

Grounding Layer

The connection between model and substrate. The grounding layer is what moves the model from free generation to substrate-anchored answering. This is the most important layer for the ski-chalet result: grounded Gemma performs materially better than raw Gemma on fixed domain questions.

Grounding changes the answer surface before exactness improves. Reliability improves first. That is the key lesson from TourAgent — and it applies to the chalet scenario directly.

3

Control Layer

The structured interaction surface that keeps the local model on the right tool path. The V4 evidence showed that the local layer kept user intent but drifted off the Phoenix tool surface on the first pass. One structured redirect was enough to recover. That redirect is the control layer working.

Without the control layer, tool-surface drift is the dominant failure mode — not total failure, but persistent inefficiency that degrades the offline session over time.

4

Audit Layer

The ability to inspect, trace, and verify what the model produced. In an offline scenario, the audit layer is especially important because there is no external check. The user must be able to see whether the model stayed on the substrate, whether it drifted, and whether a given answer is trustworthy.

The audit layer is not about punishing failure — it is about making the offline session observable enough to stay useful.

The Portable Bundle

The portable bundle is a USB kit that carries the harness into the offline environment. It is what turns a pre-installed local model runtime into a working domain system.

A plausible bundle contains:

Repo Snapshot

A versioned copy of the Project Phoenix domain repo at a known stable point.

Local venv

A portable Python environment with all dependencies already resolved. No internet needed for pip.

Deterministic Datasets

The domain data in structured local form — the substrate the model will be grounded against.

Databases

SQLite or equivalent local databases carrying the domain's structured reference content.

Question Banks

Fixed question sets that let you verify the system is working as expected after setup.

Grounding Artifacts

The processed domain artifacts that the grounding layer uses — embeddings, indexes, answer sheets.

Answer Sheets

Reference answers for the verification questions. Lets you detect drift and confirm the harness is grounded correctly.

Scripts

Setup and run scripts that let you initialize the local environment without manual configuration steps.

Docs

Offline documentation and usage notes. You may not remember the exact invocation syntax at the chalet.

Three Configuration Tiers

Not every chalet setup is equal. The paper identifies three tiers of offline configuration that correspond to different levels of what is achievable. The current evidence most clearly supports Tier 2.

Tier 1 — Minimal Grounded Chalet

Local model + substrate only. Enough for grounded domain answering on the core question set. Tool-surface drift is likely without the control layer. Verification is limited without the audit layer. The lower bound of useful offline operation.

Tier 2 — Prepared Chalet Config ← Current evidence supports this

All four harness layers present. Full portable bundle. Structured redirects available. Answers verifiable against the substrate. The scenario this paper is primarily about. Odd, but not impossible.

Tier 3 — Full Implementation-Agent Chalet

Tier 2 plus a full implementation-agent workflow offline. Code generation, multi-step domain construction, write-then-verify cycles — all without cloud. This tier is not yet well-supported by the current evidence. A useful future boundary to define.

These tiers are the "next step" from the paper: defining configuration levels precisely so the claim can be stated exactly, rather than as a vague gesture at offline usefulness.

The Host Machine Assumptions

The prepared-chalet scenario assumes the host machine provides:

Working GPU drivers
A 3090-class GPU
Ollama already installed
A local model such as gemma3:27b already downloaded
Enough disk and runtime support to execute the portable bundle

This is not everyday-normal. But it is plausible for the specific class of local-LLM hobbyist that the thought experiment assumes. A person who owns a 3090 and runs local models is, by definition, already closer to this setup than the median user.

That is the realism calibration. The scenario is niche. The claim is not that it is common — only that it is achievable by the people most likely to attempt it.