Skip to content

Add chat-template rendering and special-token tokenize path#6

Merged
FuJacob merged 1 commit into
mainfrom
add-chat-template-support
May 30, 2026
Merged

Add chat-template rendering and special-token tokenize path#6
FuJacob merged 1 commit into
mainfrom
add-chat-template-support

Conversation

@FuJacob
Copy link
Copy Markdown
Owner

@FuJacob FuJacob commented May 30, 2026

Summary

The engine only exposed a raw tokenize (with parse_special=false) and no way to render a model's built-in chat template. Callers feeding instruct models (Qwen, Gemma, Llama) therefore had to send unstructured labeled text, which pushes those models off-distribution and makes them echo prompt scaffolding (e.g. repeating App: / Text before caret: section headers). This adds the C++ surface a caller needs to render the model's own template and tokenize the result correctly, with a clean fallback for base models that ship no template.

All changes are additive. Existing tokenize/detokenize behavior is unchanged.

What's new

  • tokenizeWithOptions(text, len, add_special, parse_special) — lets the chat-template path tokenize rendered control markers as real token IDs. Plain tokenize() now delegates here with parse_special=false, preserving the historical contract byte-for-byte (BOS still added per model metadata).
  • hasChatTemplate() — true when the loaded model ships a template in GGUF metadata, so callers fall back to the raw path for base models.
  • applyChatTemplate(messages, count, add_assistant) — renders a conversation via llama_model_chat_template + llama_chat_apply_template, returning the formatted prompt. Empty string signals "fall back to raw."
  • ChatMessage { role, content } value type for the message list.

Validation

  • swift testExecuted 13 tests, with 0 failures (10 pre-existing + 3 new no-model guard tests).
  • The end-to-end test gains a chat-template assertion that runs when COTABBY_TEST_MODEL_PATH is set: a templated render must tokenize (with parse_special) to a non-empty token list.
  • swift build clean.

Notes / limitations

  • llama_chat_apply_template in llama.cpp b9310 uses a predefined template list, not a jinja parser, and exposes no enable_thinking parameter. So generation-time "thinking off" for reasoning models is not reachable through this API; that would need a separate model-specific approach. This PR is scoped to template rendering + the tokenize path that unblocks it.
  • No FIM path here: FIM tokens only exist in code-model vocabs and would break the general/instruct defaults and drag-and-drop models, so it was intentionally left out.

Consumer

The Cotabby app (tabby-1) will adopt this in a paired PR: pin bump + a LlamaPromptRenderer rewrite that emits role-structured messages when hasChatTemplate() is true and falls back to the current raw prompt otherwise.

The engine only exposed a raw tokenize (parse_special=false) and no way to
render a model's built-in chat template. Callers feeding instruct models
(Qwen, Gemma, Llama) therefore had to send unstructured text, which pushes
those models off-distribution and makes them echo prompt scaffolding.

New API (all additive; existing tokenize/detokenize unchanged):
- tokenizeWithOptions(text, len, add_special, parse_special): lets the
  chat-template path tokenize rendered control markers as real token IDs.
  Plain tokenize() now delegates here with parse_special=false, preserving
  the historical contract byte-for-byte.
- hasChatTemplate(): true when the loaded model ships a template in GGUF
  metadata, so callers can fall back to the raw path for base models.
- applyChatTemplate(messages, count, add_assistant): renders a conversation
  through llama_model_chat_template + llama_chat_apply_template, returning
  the formatted prompt (empty string signals "fall back to raw").
- ChatMessage { role, content } value type for the message list.

Tests cover the no-model guards for all three new entry points and, when
COTABBY_TEST_MODEL_PATH is set, assert a templated render tokenizes to a
non-empty list.
@FuJacob FuJacob force-pushed the add-chat-template-support branch from 3b7f24d to e8a7049 Compare May 30, 2026 22:41
@FuJacob FuJacob merged commit 7545664 into main May 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant