Feed instruct models their own chat template; write both prompt paths as prose by FuJacob · Pull Request #438 · FuJacob/cotabby

FuJacob · 2026-05-30T23:52:22Z

Summary

The local llama path made small instruct models (Qwen, Gemma) echo prompt scaffolding back as ghost text — App:, Text before caret:, Screen context:, etc. Two compounding causes, both fixed here:

No chat template. Instruct models were sent one raw labeled prompt string ending in Text before caret:\n<prefix>, which is off-distribution for a chat-tuned model. Now, when the model ships a template (hasChatTemplate()), the prompt is rendered through the model's own template: rules + context in a system turn, the bare prefix in a user turn, assistant turn opened so the model continues the user's text as its own.
Label scaffolding. Even in the system turn, standalone Label: lines were what models parroted. So both prompt builders are now plain prose with zero standalone labels:
- messages() (chat-template path, instruct models)
- prompt() (raw single-string path, base / drag-and-drop GGUFs with no template)

The one structural invariant preserved in both: the bare prefix is the last thing in the string, so the model continues from where the user stopped.

Pairs with cotabbyinference #6 + #7 (chat-template C-ABI), already merged to inference main.

Why prose on the raw path too

Initially this only restructured the template path, on the assumption the raw prompt() labels were load-bearing for base models. On review that didn't hold — what locates the caret is the prefix being last, not the label — so the prose treatment applies to both. A base/web-text model is arguably less confused by prose than by a labeled instruction blob.

What's NOT changed

FoundationModelPromptRenderer (Apple Intelligence path) keeps its labeled format and its tests — separate path, out of scope.

Validation

xcodebuild build + build-for-testing + swiftlint --strict all pass locally.
Tests updated to the prose contract across LlamaPromptRendererTests, CustomRulesTests, LanguageSupportTests, SuggestionRequestFactoryTests: assert the new wording, assert the bare-prefix-last invariant, and assert no standalone Label: line survives with every context block populated. Foundation Models tests untouched. Full suite runs on CI.
KV-cache reuse preserved: prompt bytes derive from the same templated/prose string that gets tokenized; the external byte hint is only ever an upper-bound clamp.
Not yet validated on a real model. This shapes the prompt per the tests; the actual "does the model stop echoing" check needs a hands-on Qwen/Gemma + base-GGUF smoke test before merge.

History

#444 (separate prose PR) was folded into this branch — it was stacked on this feature branch, so CI never ran on it. Squashing it here gives one coherent PR with full CI.

The chat-template system turn still carried standalone "Label:" lines ("Screen context:", "Screen content:", "User's clipboard:", "User Profile Context:", "Your style preferences:"). Small instruct models echo a lone "Label:" line straight into the ghost text, so #438 (moving the prefix out of the prompt) removed the worst offender but left these. This folds every context block into plain sentences: - "User is on X." -> "The user is writing in X." - "Screen content:\n<s>" -> "Nearby on screen, the user can see <s>" - "User's clipboard:\n<c>" -> "The user's clipboard currently contains <c>" - name / rules -> single prose sentences No value the model needs is lost; only the label surface it was copying. The raw prompt() path is intentionally left labeled — base completion models rely on that scaffolding to locate the caret, and they have no chat template to lean on. Tests updated to assert the prose wording and, with all context populated, that no standalone "Label:" line survives.

Small instruct models (Qwen, Gemma) were sent one raw labeled prompt string ending in a "Text before caret:" header, which is off-distribution for a chat-tuned model and is what makes them echo prompt scaffolding back as ghost text (App:, Text before caret:, Continuation:). This renders the prompt through the model's own chat template instead, so rules sit in a system turn and the bare prefix in a user turn, and the model continues the user's text as its own. - LlamaPromptRenderer.messages(...) -> ChatPrompt{system,user}, the role-split form of the existing prompt(...). prompt(...) stays as the fallback. - SuggestionRequest.llamaChatPrompt carries it; built in SuggestionRequestFactory. - Threaded through LlamaSuggestionEngine -> LlamaRuntimeManager -> LlamaRuntimeCore. - LlamaRuntimeCore.templatedPromptTokens renders via the engine's chat-template C-ABI (applyChatTemplate into a caller buffer) and tokenizes with parse_special so the template's control markers become real token IDs. Falls back to the raw path when the model ships no template (base models, drag-and-drop GGUFs). KV-cache reuse stays correct: promptBytes is derived from the SAME templated string that is tokenized, and the external byte hint is only ever an upper-bound clamp (min with the real common prefix), so a stale hint can only shrink reuse, never corrupt it.

Locks the anti-regression invariant behind the chat-template fix: the user turn must be exactly the prefix with no "Text before caret:" / "Final instruction:" label scaffolding (the strings small instruct models echoed into ghost text), the prefix must not leak into the system turn, and profile/rules/clipboard/screen context must land in the system turn (and be omitted when absent). Six pure-function cases. Full app-hosted suite: 320 tests, 0 failures (run with CODE_SIGNING_ALLOWED=NO per the local-test setup added in #436).

The chat-template system turn still carried standalone "Label:" lines ("Screen context:", "Screen content:", "User's clipboard:", "User Profile Context:", "Your style preferences:"). Small instruct models echo a lone "Label:" line straight into the ghost text, so #438 (moving the prefix out of the prompt) removed the worst offender but left these. This folds every context block into plain sentences: - "User is on X." -> "The user is writing in X." - "Screen content:\n<s>" -> "Nearby on screen, the user can see <s>" - "User's clipboard:\n<c>" -> "The user's clipboard currently contains <c>" - name / rules -> single prose sentences No value the model needs is lost; only the label surface it was copying. The raw prompt() path is intentionally left labeled — base completion models rely on that scaffolding to locate the caret, and they have no chat template to lean on. Tests updated to assert the prose wording and, with all context populated, that no standalone "Label:" line survives.

FuJacob force-pushed the chat-template-prompts branch 6 times, most recently from ca6fcaf to 67516c2 Compare May 31, 2026 00:42

FuJacob mentioned this pull request May 31, 2026

Write both prompt paths as prose, not Label: blocks #444

Merged

FuJacob changed the title ~~Feed instruct models their own chat template (local llama path)~~ Feed instruct models their own chat template; write both prompt paths as prose May 31, 2026

FuJacob closed this May 31, 2026

FuJacob reopened this May 31, 2026

FuJacob force-pushed the chat-template-prompts branch from 87fa9cc to c6e9505 Compare May 31, 2026 02:14

FuJacob added 4 commits May 30, 2026 19:17

Update LlamaPromptRenderer file overview for the two-shape prose design

35b212a

FuJacob force-pushed the chat-template-prompts branch from 1ac44e8 to 35b212a Compare May 31, 2026 02:22

FuJacob added 2 commits May 30, 2026 19:57

Update ExtendedContextTests for the prose LlamaPromptRenderer contract

ae10c2b

Merge branch 'main' into chat-template-prompts

45ea5e7

FuJacob mentioned this pull request May 31, 2026

Generation-time quality controls: token masks, single-line, mid-word continuation #488

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feed instruct models their own chat template; write both prompt paths as prose#438

Feed instruct models their own chat template; write both prompt paths as prose#438
FuJacob wants to merge 6 commits into
mainfrom
chat-template-prompts

FuJacob commented May 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FuJacob commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why prose on the raw path too

What's NOT changed

Validation

History

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FuJacob commented May 30, 2026 •

edited

Loading