Model Comparison

Capture Integrity First

The current protocol-comparison line is a harness finding before it is a model ranking.

Primary Result

Legacy ollama run subprocess capture overstated thinking-mode protocol failures. Clean REST API capture changes the ranking materially. This is the core finding of Paper 1.16.

Model	Mode	Capture	Pass	Current read
`gemma3:27b`	unsuppressed	`ollama_api`	5/6	Strong baseline; `PROTO_010` remains a content miss.
`qwen2.5:14b`	unsuppressed	`ollama_api`	4/6	Useful contrast; weaker than Gemma 3 and Gemma 4:31b in this slice.
`gemma4:26b`	unsuppressed	`ollama_api`	4/6	Flat-schema gap remains without suppression.
`gemma4:26b`	suppressed	`ollama_api`	6/6	Full pass; suppression resolves the flat-schema issue.
`gemma4:31b`	unsuppressed	`ollama_api`	6/6	Strongest current local protocol lane.
`gemma4:31b`	suppressed	`ollama_api`	6/6	Same result; suppression not required.

Interpretation Rules

Harness First

Thinking-mode models require clean-output capture. Terminal subprocess capture is not canonical for protocol evaluation.

Model Verdict Revised

gemma4:31b is the strongest current local result in this protocol lane under corrected capture. The earlier severe regression story was a measurement artifact.

Residual Failure

PROTO_010 remains a genuine content failure outside the capture problem. Not every miss was a harness artifact.

Featured Current Papers

Paper	Role	Current relevance
Paper 1.16	Primary paper	Defines the capture-integrity correction. The full inventory remains in portfolio order; this paper is featured because it changes the current protocol line.
Operator Shell Pattern	Architecture paper	Explains how model-comparison packets fit into the OpenClaw outer layer without crossing the authority boundary.