Add chat-template rendering and special-token tokenize path by FuJacob · Pull Request #6 · FuJacob/cotabbyinference

FuJacob · 2026-05-30T22:39:38Z

Summary

The engine only exposed a raw tokenize (with parse_special=false) and no way to render a model's built-in chat template. Callers feeding instruct models (Qwen, Gemma, Llama) therefore had to send unstructured labeled text, which pushes those models off-distribution and makes them echo prompt scaffolding (e.g. repeating App: / Text before caret: section headers). This adds the C++ surface a caller needs to render the model's own template and tokenize the result correctly, with a clean fallback for base models that ship no template.

All changes are additive. Existing tokenize/detokenize behavior is unchanged.

What's new

tokenizeWithOptions(text, len, add_special, parse_special) — lets the chat-template path tokenize rendered control markers as real token IDs. Plain tokenize() now delegates here with parse_special=false, preserving the historical contract byte-for-byte (BOS still added per model metadata).
hasChatTemplate() — true when the loaded model ships a template in GGUF metadata, so callers fall back to the raw path for base models.
applyChatTemplate(messages, count, add_assistant) — renders a conversation via llama_model_chat_template + llama_chat_apply_template, returning the formatted prompt. Empty string signals "fall back to raw."
ChatMessage { role, content } value type for the message list.

Validation

swift test — Executed 13 tests, with 0 failures (10 pre-existing + 3 new no-model guard tests).
The end-to-end test gains a chat-template assertion that runs when COTABBY_TEST_MODEL_PATH is set: a templated render must tokenize (with parse_special) to a non-empty token list.
swift build clean.

Notes / limitations

llama_chat_apply_template in llama.cpp b9310 uses a predefined template list, not a jinja parser, and exposes no enable_thinking parameter. So generation-time "thinking off" for reasoning models is not reachable through this API; that would need a separate model-specific approach. This PR is scoped to template rendering + the tokenize path that unblocks it.
No FIM path here: FIM tokens only exist in code-model vocabs and would break the general/instruct defaults and drag-and-drop models, so it was intentionally left out.

Consumer

The Cotabby app (tabby-1) will adopt this in a paired PR: pin bump + a LlamaPromptRenderer rewrite that emits role-structured messages when hasChatTemplate() is true and falls back to the current raw prompt otherwise.

The engine only exposed a raw tokenize (parse_special=false) and no way to render a model's built-in chat template. Callers feeding instruct models (Qwen, Gemma, Llama) therefore had to send unstructured text, which pushes those models off-distribution and makes them echo prompt scaffolding. New API (all additive; existing tokenize/detokenize unchanged): - tokenizeWithOptions(text, len, add_special, parse_special): lets the chat-template path tokenize rendered control markers as real token IDs. Plain tokenize() now delegates here with parse_special=false, preserving the historical contract byte-for-byte. - hasChatTemplate(): true when the loaded model ships a template in GGUF metadata, so callers can fall back to the raw path for base models. - applyChatTemplate(messages, count, add_assistant): renders a conversation through llama_model_chat_template + llama_chat_apply_template, returning the formatted prompt (empty string signals "fall back to raw"). - ChatMessage { role, content } value type for the message list. Tests cover the no-model guards for all three new entry points and, when COTABBY_TEST_MODEL_PATH is set, assert a templated render tokenizes to a non-empty list.

FuJacob force-pushed the add-chat-template-support branch from 3b7f24d to e8a7049 Compare May 30, 2026 22:41

FuJacob mentioned this pull request May 30, 2026

Strip hallucinated prompt-scaffolding labels from suggestions FuJacob/cotabby#434

Merged

FuJacob merged commit 7545664 into main May 30, 2026
1 check passed

FuJacob mentioned this pull request May 30, 2026

Make applyChatTemplate a buffer-based C ABI for objcxx interop #7

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add chat-template rendering and special-token tokenize path#6

Add chat-template rendering and special-token tokenize path#6
FuJacob merged 1 commit into
mainfrom
add-chat-template-support

FuJacob commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FuJacob commented May 30, 2026

Summary

What's new

Validation

Notes / limitations

Consumer

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant