Model Comparison

Capture Integrity First

The current protocol-comparison line is a harness finding before it is a model ranking.

Primary Result

Legacy ollama run subprocess capture overstated thinking-mode protocol failures. Clean REST API capture changes the ranking materially. This is the core finding of Paper 1.16.

ModelModeCapturePassCurrent read
gemma3:27bunsuppressedollama_api5/6Strong baseline; PROTO_010 remains a content miss.
qwen2.5:14bunsuppressedollama_api4/6Useful contrast; weaker than Gemma 3 and Gemma 4:31b in this slice.
gemma4:26bunsuppressedollama_api4/6Flat-schema gap remains without suppression.
gemma4:26bsuppressedollama_api6/6Full pass; suppression resolves the flat-schema issue.
gemma4:31bunsuppressedollama_api6/6Strongest current local protocol lane.
gemma4:31bsuppressedollama_api6/6Same result; suppression not required.

Interpretation Rules

Harness First

Thinking-mode models require clean-output capture. Terminal subprocess capture is not canonical for protocol evaluation.

Model Verdict Revised

gemma4:31b is the strongest current local result in this protocol lane under corrected capture. The earlier severe regression story was a measurement artifact.

Residual Failure

PROTO_010 remains a genuine content failure outside the capture problem. Not every miss was a harness artifact.

Featured Current Papers

PaperRoleCurrent relevance
Paper 1.16Primary paperDefines the capture-integrity correction. The full inventory remains in portfolio order; this paper is featured because it changes the current protocol line.
Operator Shell PatternArchitecture paperExplains how model-comparison packets fit into the OpenClaw outer layer without crossing the authority boundary.