Routing
`llama3.1:8b` proved good enough for current ShowcaseAgent routing work.
Project Phoenix ยท Addendum Paper
Different local models win in different roles and policy regimes. The question is not which model wins in general. The question is which model is good enough for which job.
Current result: smaller local models are already enough for meaningful routing and some grounded use, while exactness-sensitive downstream handoff remains a sharper boundary where stronger models or stronger policies still matter.
Project Phoenix does not treat local-model evaluation as a single ranking problem. Routing, grounded domain use, machine-facing protocol work, and repair-assisted pipelines are different operational roles. Models that look weaker in one role can still be the right answer in another.
`llama3.1:8b` proved good enough for current ShowcaseAgent routing work.
Smaller models crossed into grounded TourAgent usefulness once the harness carried the answer path.
`gemma3:27b` remained materially stronger when the answer had to survive stricter downstream machine-facing requirements.
This paper is the role-boundary piece of the local-model details layer. It explains why raw size is not enough as an evaluation language and why repair policy and handoff strictness change the answer before the model leaderboard does.