Generation-time quality controls: token masks, single-line, mid-word continuation by FuJacob · Pull Request #488 · FuJacob/cotabby

FuJacob · 2026-05-31T19:04:13Z

Summary

Wires generation-time quality controls into the app so unwanted tokens are stopped at the sampler instead of cleaned up after the fact. Builds on the engine changes in FuJacob/cotabbyinference#8. Second of two stacked PRs, based on the feat/output-safety-gates branch (#485).

Nonprintable token mask (free on the pin bump): the engine masks control, chat-template, and unused tokens from sampling (EOG stays sampleable so natural stopping still works), so they can never appear as ghost text. No app code beyond the dependency bump.
single_line: set from the focused field (LlamaGenerationOptions.singleLine = !request.isMultiLineEnabled). Single-line fields never receive a multi-line completion at the source.
forceWordContinuation: MidWordContinuationPolicy fires only when the caret sits strictly inside a word, so the engine constrains the first sampled token to continue the current word. At a normal word end it does not fire, so ordinary next-word predictions are unchanged.
Confidence suppression (off by default): the engine now reports a per-token log-probability. LlamaRuntimeCore averages it and, when LlamaGenerationOptions.confidenceFloor is raised above its default of -infinity, suppresses low-confidence completions via the pure ConfidenceSuppressionPolicy. Disabled by default, so no behavior change until a caller opts in.

Threading: singleLine / forceWordContinuation / confidenceFloor flow through LlamaGenerationOptions into LlamaRuntimeCore (sampling config, setForceWordContinuation before each decodePrompt, and the logprob accumulation in the sample loop).

Validation

swiftlint lint --strict --quiet <touched files>          # exit 0
xcodebuild ... build         -derivedDataPath ...         # ** BUILD SUCCEEDED **
xcodebuild ... build-for-testing -derivedDataPath ...     # ** TEST BUILD SUCCEEDED **
xcodebuild test ... CODE_SIGNING_ALLOWED=NO \
  -only-testing:CotabbyTests/ConfidenceSuppressionPolicyTests \
  -only-testing:CotabbyTests/MidWordContinuationPolicyTests \
  -only-testing:CotabbyTests/TrailingDuplicationFilterTests \
  -only-testing:CotabbyTests/InsertionSafetyGateTests \
  -only-testing:CotabbyTests/SentenceBoundaryClassifierTests
# ** TEST SUCCEEDED **  Executed 27 tests, with 0 failures

Package pin resolves to cotabbyinference @ feat/generation-quality-controls (be64365). The engine side was separately validated with swift test against a local gemma-3-1b model: 20 tests, 0 failures (see the engine PR).

Linked issues

Depends on FuJacob/cotabbyinference#8 (engine).

Risk / rollout notes

Merge order: project.yml pins the engine feature branch. Merge the engine PR to cotabbyinference main first, then flip this PR's project.yml back to branch: main, re-resolve, and merge. Kept as a draft until then.
Behavior: the nonprintable mask changes sampling for all local models but only removes never-visible tokens (greedy determinism preserved, verified by the engine's interleaved test). single_line only affects single-line fields. forceWordContinuation uses a narrow trigger. Confidence suppression is off by default.
Follow-ups (deliberate): wiring a Settings control for confidenceFloor, full multi-candidate N-best ranking, and the base-model prompt path (covered by the in-flight Feed instruct models their own chat template; write both prompt paths as prose #438). These need on-device tuning that green CI cannot stand in for.
Typo suppression is handled by Suppress completions on typo'd word and offer context-aware correction #353, not here.

Point the CotabbyInference package at the engine branch that adds the token masks, mid-word continuation, and KV snapshot APIs, and use them: - The always-on nonprintable token mask now applies automatically (control, chat-template, and unused tokens can no longer be emitted as visible text), with no app code beyond the pin bump. - single_line is set from the focused field (LlamaGenerationOptions.singleLine = !isMultiLineEnabled) so single-line fields never receive a multi-line completion at the source instead of being truncated after the fact. - forceWordContinuation fires only when the caret sits strictly inside a word (MidWordContinuationPolicy), so the engine constrains the first token to continue that word without affecting ordinary next-word predictions. Threads singleLine / forceWordContinuation through LlamaGenerationOptions into LlamaRuntimeCore (sampling config + setForceWordContinuation before each decodePrompt, fresh and reuse paths). Adds MidWordContinuationPolicy + tests.

Use the engine's new per-token logprob to drop completions the model itself was unsure about. LlamaRuntimeCore accumulates the average per-token log-probability and, when LlamaGenerationOptions.confidenceFloor is raised above its default of -infinity, suppresses completions below the floor. ConfidenceSuppressionPolicy holds the pure decision and is unit-tested. Disabled by default, so behavior is unchanged until a caller opts in; wiring a Settings control and full multi-candidate N-best ranking remain follow-ups.

FuJacob force-pushed the feat/output-safety-gates branch from 023d913 to c42c482 Compare May 31, 2026 19:13

FuJacob added 2 commits May 31, 2026 12:13

FuJacob force-pushed the feat/generation-quality-controls branch from 937f8c6 to ca10419 Compare May 31, 2026 19:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Generation-time quality controls: token masks, single-line, mid-word continuation#488

Generation-time quality controls: token masks, single-line, mid-word continuation#488
FuJacob wants to merge 2 commits into
feat/output-safety-gatesfrom
feat/generation-quality-controls

FuJacob commented May 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FuJacob commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Linked issues

Risk / rollout notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FuJacob commented May 31, 2026 •

edited

Loading