Feed instruct models their own chat template; write both prompt paths as prose#438
Draft
FuJacob wants to merge 6 commits into
Draft
Feed instruct models their own chat template; write both prompt paths as prose#438FuJacob wants to merge 6 commits into
FuJacob wants to merge 6 commits into
Conversation
ca6fcaf to
67516c2
Compare
FuJacob
added a commit
that referenced
this pull request
May 31, 2026
The chat-template system turn still carried standalone "Label:" lines
("Screen context:", "Screen content:", "User's clipboard:", "User Profile
Context:", "Your style preferences:"). Small instruct models echo a lone
"Label:" line straight into the ghost text, so #438 (moving the prefix out of
the prompt) removed the worst offender but left these. This folds every context
block into plain sentences:
- "User is on X." -> "The user is writing in X."
- "Screen content:\n<s>" -> "Nearby on screen, the user can see <s>"
- "User's clipboard:\n<c>" -> "The user's clipboard currently contains <c>"
- name / rules -> single prose sentences
No value the model needs is lost; only the label surface it was copying. The
raw prompt() path is intentionally left labeled — base completion models rely on
that scaffolding to locate the caret, and they have no chat template to lean on.
Tests updated to assert the prose wording and, with all context populated, that
no standalone "Label:" line survives.
87fa9cc to
c6e9505
Compare
Small instruct models (Qwen, Gemma) were sent one raw labeled prompt string
ending in a "Text before caret:" header, which is off-distribution for a
chat-tuned model and is what makes them echo prompt scaffolding back as ghost
text (App:, Text before caret:, Continuation:). This renders the prompt through
the model's own chat template instead, so rules sit in a system turn and the
bare prefix in a user turn, and the model continues the user's text as its own.
- LlamaPromptRenderer.messages(...) -> ChatPrompt{system,user}, the role-split
form of the existing prompt(...). prompt(...) stays as the fallback.
- SuggestionRequest.llamaChatPrompt carries it; built in SuggestionRequestFactory.
- Threaded through LlamaSuggestionEngine -> LlamaRuntimeManager -> LlamaRuntimeCore.
- LlamaRuntimeCore.templatedPromptTokens renders via the engine's chat-template
C-ABI (applyChatTemplate into a caller buffer) and tokenizes with parse_special
so the template's control markers become real token IDs. Falls back to the raw
path when the model ships no template (base models, drag-and-drop GGUFs).
KV-cache reuse stays correct: promptBytes is derived from the SAME templated
string that is tokenized, and the external byte hint is only ever an upper-bound
clamp (min with the real common prefix), so a stale hint can only shrink reuse,
never corrupt it.
Locks the anti-regression invariant behind the chat-template fix: the user turn must be exactly the prefix with no "Text before caret:" / "Final instruction:" label scaffolding (the strings small instruct models echoed into ghost text), the prefix must not leak into the system turn, and profile/rules/clipboard/screen context must land in the system turn (and be omitted when absent). Six pure-function cases. Full app-hosted suite: 320 tests, 0 failures (run with CODE_SIGNING_ALLOWED=NO per the local-test setup added in #436).
The chat-template system turn still carried standalone "Label:" lines
("Screen context:", "Screen content:", "User's clipboard:", "User Profile
Context:", "Your style preferences:"). Small instruct models echo a lone
"Label:" line straight into the ghost text, so #438 (moving the prefix out of
the prompt) removed the worst offender but left these. This folds every context
block into plain sentences:
- "User is on X." -> "The user is writing in X."
- "Screen content:\n<s>" -> "Nearby on screen, the user can see <s>"
- "User's clipboard:\n<c>" -> "The user's clipboard currently contains <c>"
- name / rules -> single prose sentences
No value the model needs is lost; only the label surface it was copying. The
raw prompt() path is intentionally left labeled — base completion models rely on
that scaffolding to locate the caret, and they have no chat template to lean on.
Tests updated to assert the prose wording and, with all context populated, that
no standalone "Label:" line survives.
1ac44e8 to
35b212a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The local llama path made small instruct models (Qwen, Gemma) echo prompt scaffolding back as ghost text —
App:,Text before caret:,Screen context:, etc. Two compounding causes, both fixed here:No chat template. Instruct models were sent one raw labeled prompt string ending in
Text before caret:\n<prefix>, which is off-distribution for a chat-tuned model. Now, when the model ships a template (hasChatTemplate()), the prompt is rendered through the model's own template: rules + context in a system turn, the bare prefix in a user turn, assistant turn opened so the model continues the user's text as its own.Label scaffolding. Even in the system turn, standalone
Label:lines were what models parroted. So both prompt builders are now plain prose with zero standalone labels:messages()(chat-template path, instruct models)prompt()(raw single-string path, base / drag-and-drop GGUFs with no template)The one structural invariant preserved in both: the bare prefix is the last thing in the string, so the model continues from where the user stopped.
Pairs with cotabbyinference #6 + #7 (chat-template C-ABI), already merged to inference
main.Why prose on the raw path too
Initially this only restructured the template path, on the assumption the raw
prompt()labels were load-bearing for base models. On review that didn't hold — what locates the caret is the prefix being last, not the label — so the prose treatment applies to both. A base/web-text model is arguably less confused by prose than by a labeled instruction blob.What's NOT changed
FoundationModelPromptRenderer(Apple Intelligence path) keeps its labeled format and its tests — separate path, out of scope.Validation
xcodebuild build+build-for-testing+swiftlint --strictall pass locally.LlamaPromptRendererTests,CustomRulesTests,LanguageSupportTests,SuggestionRequestFactoryTests: assert the new wording, assert the bare-prefix-last invariant, and assert no standaloneLabel:line survives with every context block populated. Foundation Models tests untouched. Full suite runs on CI.History
#444 (separate prose PR) was folded into this branch — it was stacked on this feature branch, so CI never ran on it. Squashing it here gives one coherent PR with full CI.