The Four Harness Layers
The working harness is a layered local system. Each layer carries a distinct function. All four are needed for the prepared-chalet scenario to hold.
Deterministic Substrate
The ground-truth layer. Structured datasets, databases, and reference artifacts that the model is grounded against. This is what makes answers checkable. Without a deterministic substrate, the model is just generating — there is no local authority to anchor its outputs against.
At the chalet: the portable bundle carries the domain databases, question banks, and reference answer sheets that constitute the substrate.
Grounding Layer
The connection between model and substrate. The grounding layer is what moves the model from free generation to substrate-anchored answering. This is the most important layer for the ski-chalet result: grounded Gemma performs materially better than raw Gemma on fixed domain questions.
Grounding changes the answer surface before exactness improves. Reliability improves first. That is the key lesson from TourAgent — and it applies to the chalet scenario directly.
Control Layer
The structured interaction surface that keeps the local model on the right tool path. The V4 evidence showed that the local layer kept user intent but drifted off the Phoenix tool surface on the first pass. One structured redirect was enough to recover. That redirect is the control layer working.
Without the control layer, tool-surface drift is the dominant failure mode — not total failure, but persistent inefficiency that degrades the offline session over time.
Audit Layer
The ability to inspect, trace, and verify what the model produced. In an offline scenario, the audit layer is especially important because there is no external check. The user must be able to see whether the model stayed on the substrate, whether it drifted, and whether a given answer is trustworthy.
The audit layer is not about punishing failure — it is about making the offline session observable enough to stay useful.
The Portable Bundle
The portable bundle is a USB kit that carries the harness into the offline environment. It is what turns a pre-installed local model runtime into a working domain system.
A plausible bundle contains:
Repo Snapshot
A versioned copy of the Project Phoenix domain repo at a known stable point.
Local venv
A portable Python environment with all dependencies already resolved. No internet needed for pip.
Deterministic Datasets
The domain data in structured local form — the substrate the model will be grounded against.
Databases
SQLite or equivalent local databases carrying the domain's structured reference content.
Question Banks
Fixed question sets that let you verify the system is working as expected after setup.
Grounding Artifacts
The processed domain artifacts that the grounding layer uses — embeddings, indexes, answer sheets.
Answer Sheets
Reference answers for the verification questions. Lets you detect drift and confirm the harness is grounded correctly.
Scripts
Setup and run scripts that let you initialize the local environment without manual configuration steps.
Docs
Offline documentation and usage notes. You may not remember the exact invocation syntax at the chalet.
Three Configuration Tiers
Not every chalet setup is equal. The paper identifies three tiers of offline configuration that correspond to different levels of what is achievable. The current evidence most clearly supports Tier 2.
Tier 1 — Minimal Grounded Chalet
Local model + substrate only. Enough for grounded domain answering on the core question set. Tool-surface drift is likely without the control layer. Verification is limited without the audit layer. The lower bound of useful offline operation.
Tier 2 — Prepared Chalet Config ← Current evidence supports this
All four harness layers present. Full portable bundle. Structured redirects available. Answers verifiable against the substrate. The scenario this paper is primarily about. Odd, but not impossible.
Tier 3 — Full Implementation-Agent Chalet
Tier 2 plus a full implementation-agent workflow offline. Code generation, multi-step domain construction, write-then-verify cycles — all without cloud. This tier is not yet well-supported by the current evidence. A useful future boundary to define.
These tiers are the "next step" from the paper: defining configuration levels precisely so the claim can be stated exactly, rather than as a vague gesture at offline usefulness.
The Host Machine Assumptions
The prepared-chalet scenario assumes the host machine provides:
- Working GPU drivers
- A 3090-class GPU
- Ollama already installed
- A local model such as
gemma3:27balready downloaded - Enough disk and runtime support to execute the portable bundle
This is not everyday-normal. But it is plausible for the specific class of local-LLM hobbyist that the thought experiment assumes. A person who owns a 3090 and runs local models is, by definition, already closer to this setup than the median user.
That is the realism calibration. The scenario is niche. The claim is not that it is common — only that it is achievable by the people most likely to attempt it.