Answer Modes

Modes

Four modes. Four different claim boundaries. Never collapse them into one vague category.

The reason mode separation matters is not rigor for its own sake. It is that collapsing all local results into one vague category ("local LLM result") makes every claim uninterpretable. These boundaries are what make results reproducible and trustworthy.

Mode Summary

Mode What the model sees Default use Claim boundary
raw User request only Baseline / debug Does not prove domain usefulness
grounded Request + verified local context Default user-facing mode Proves constrained usefulness, not independent reasoning
artifact Answer returned from validated precomputed layer Stable demos, presentations Proves validated answer availability, not live execution
implementation_agent Model runs inside deterministic workflow Escalation / traceability Proves workflow capability, not raw-model strength

These are not a quality ranking. They are a claim boundary map. A grounded result is not "better than" a raw result in every context — it is a result with a different, clearer claim about what it proves.

Raw Mode

What it is: The local model sees only the user question. No verified context is injected first.

What it proves

Baseline local-model behavior on its own. Whether the model can answer plausibly without grounding.

What it does not prove

Typical failure mode

Plausible but wrong answers. Omission on list or set questions. Refusal on precise statistical questions.

Correct use

Baseline and debugging only. Raw results do not justify claims about domain usefulness. If you are tempted to publish a raw result as a domain capability claim, you are miscategorizing the experiment.

Grounded Mode

What it is: The local model sees the question plus verified domain context first. In TourAgent this means a deterministic answer seed, tool path, and evidence bundle injected into the prompt before the model responds.

What it proves

Constrained local usefulness. Whether the model can produce a good user-facing answer when facts are already verified. The practical value of grounding without full workflow execution.

What it does not prove

Typical failure mode

Good wording over a fixed validated surface, but no ability to go beyond it honestly. The model is a constrained answer renderer here, not an independent reasoner.

Why it is the default

Grounded mode balances usability, rigor, and local feasibility. The model adds presentation quality on top of deterministic correctness. That is a real contribution without overclaiming what the model is doing.

Artifact Mode

What it is: The system returns an answer from a validated precomputed answer layer. No live tool execution happens at answer time.

What it proves

Stable validated answer availability. Repeatable presentation of a frozen or overlaid answer surface.

What it does not prove

Typical failure mode

Overclaim if presented as though tools are running live. An artifact answer is fast and stable precisely because it is not running live — that is a feature, not a limitation, but it must be declared.

Correct use

Stable demonstrations, presentations, frozen validation surfaces. The TourAgent CLI --mode agent flag is actually artifact mode — the internal implementation label and the audience-facing label are deliberately separated.

Implementation-Agent Mode

What it is: The local model operates inside a deterministic workflow. That can include tools, file reads, validation checks, logging, and artifact preservation. The model is one step in a traceable pipeline, not a freeform answerer.

What it proves

Workflow capability under controlled local constraints. This is the strongest local claim among the current modes — the most meaningful one for capability claims because it shows the model contributing usefully inside a real deterministic system.

What it does not prove

Typical failure mode

Slow recovery on multi-step tasks. Higher scaffolding cost even when the final answer is correct. This mode earns the most trust but costs the most execution time.

When to use it

When traceability matters. When the answer will be saved as an artifact. When the user needs to know exactly which tools ran and what evidence supported the answer.

Audience Translation

For external explanation, the cleanest wording for talks, papers, and demos:

Internal label Plain-language equivalent
raw Model alone
grounded Model plus verified context
artifact Validated answer layer
implementation_agent Controlled local workflow

The internal labels match the CLI flags. The plain-language equivalents are for external communication. Use both consistently — the internal labels preserve technical precision; the audience labels preserve clarity.