Architecture

The Harness

Six layers that turn a local model into an operationally useful offline domain system.

The harness is not a wrapper that makes a weak model seem stronger. It is the part of the system that ensures the model's output connects to something real — and that the connection is declared, not assumed.

Six Layers

1

Stable Entrypoint

One domain-facing surface rather than visible tool chaos. The entrypoint may be a CLI, cockpit, or agent-facing wrapper — but it hides internal tool complexity from the model and the user. This preserves the useful part of the meta-tool idea: one visible domain surface with deterministic routing underneath.

2

Deterministic Substrate

Local databases, local files, deterministic service layers, reproducible queries, fixed transforms. Without this the system collapses into a local chatbot rather than a domain agent. The substrate is what the grounding layer draws from — if the substrate is not deterministic, every answer above it is suspect.

3

Grounding

Verified evidence bundles, answer seeds, tool-path summaries, and local context injection. The grounding layer is what turns a model from a freeform guesser into a constrained domain answer renderer. Grounding must come from the deterministic substrate — never from the model's prior output feeding into itself.

4

Implementation Layer

Controlled workflow execution through deterministic tools, file reads, validation checks, and saved artifacts. This is the escalation path for when a grounded answer is not enough — when the user needs full traceability or the answer will be preserved as a formal artifact. The model is one step in a real traceable pipeline.

5

Provenance

Mode, source class, snapshot boundary, and tool path where applicable. Every serious answer should carry enough metadata to explain which mode produced it, what evidence or artifact it used, what time boundary applies, whether tools were run live, and whether validation was applied. Provenance is what makes a result interpretable later.

6

Validation

A fixed request surface, repeatability policy, and saved answer artifact. This is what keeps the domain useful rather than merely plausible. Validation means: the same questions should produce the same answers, and those answers should be checkable against a ground truth. Without this layer, the system can drift without detection.

What Each Layer Does For The System

Layer What it prevents What it enables
Stable Entrypoint Tool surface chaos visible to the model Clean domain-facing interface; consistent entry behavior
Deterministic Substrate System collapse into chatbot mode Verifiable facts that grounding can draw from
Grounding Model guessing from weights alone Constrained usefulness as the default operational mode
Implementation Layer Dead end when grounded answer is insufficient Full traceable escalation path with artifact output
Provenance Uninterpretable results after the fact Reproducible, attributable answer surfaces
Validation Silent drift over time Repeatable correctness check on the full domain surface

Domain Eligibility

Not every domain is a good candidate for this pattern. The harness requires a minimum substrate to work from.

Good candidates

Domains with a meaningful human question surface, deterministic local logic, repeatable answer generation, and enough structure to define a fixed validation set.

Examples: tennis domain analytics, ISO standards lookup, climate records, historical routing problems.

Weak candidates

Domains that depend mostly on broad web lookup, vague freeform synthesis, or unstable external state with no controllable local substrate.

The harness cannot substitute for missing substrate. It can only organize what is already deterministic.

Data Freshness Policy

Not every domain needs live current data. The standard approach across all harness domains:

  1. Keep the deterministic historical base honest — do not fake currency it does not have.
  2. Record the snapshot boundary explicitly in provenance.
  3. Layer current-data overlays only where recency actually matters to the answer.
  4. Validate those overlays separately from the historical base.

Do not fake a current system by hiding frozen data boundaries. An honest frozen base with a declared snapshot is more useful than an unlabeled mix of historical and current data that the user cannot interpret.

The Harness and Model Capability

One of the practical lessons from the TourAgent implementation: the harness does not make a weak model perform like a strong one. What it does is ensure that when the model does contribute useful output, that output lands on verifiable ground rather than floating on the model's weights alone.

The corollary is the Gemini CLI observation: a strong model inside a thin harness often feels much weaker than a comparable model inside a strong harness. The harness is not cosmetic — it is the mechanism that converts model capability into domain reliability.

strong model + thin harness → unreliable in practice strong model + strong harness → reliable domain system