Bump suggestion token budgets down to track word presets by FuJacob · Pull Request #493 · FuJacob/cotabby

FuJacob · 2026-06-01T04:17:55Z

Summary

The per-preset token budget is the sole governor of completion length on the local model (the in-prompt word-range cue was removed). It was sized at ~1.5x the upper word bound, but English BPE averages ~1.3 tokens/word, so every preset could overrun its stated word range. The shortest preset was worst: the 2-4 words budget of 6 was floored up to 8 by SuggestionConfiguration.standard.maxPredictionTokens, allowing roughly 6 words for a "2-4 words" setting.

This resizes budgets to ~1.25x the upper bound so the cap lands at or just under the upper word bound, and lowers the config floor so the smallest preset can actually deliver its length:

Preset	Upper bound	Budget (before → after)	Effective before*	Effective after
2-4	4	6 → 5	8 (floored)	5
4-7	7	11 → 9	11	9
7-12	12	18 → 15	18	15
12-20	20	30 → 25	30	25

* Effective = max(SuggestionConfiguration.standard.maxPredictionTokens, presetBudget). The floor (read only in SuggestionRequestFactory.activeMaxPredictionTokens) is lowered from 8 to 5 so it never silently raises the smallest preset above its budget.

Validation

swiftlint lint --strict --quiet        # exit 0 (lints Cotabby/, per included: in .swiftlint.yml)

xcodebuild test -project Cotabby.xcodeproj -scheme Cotabby \
  -destination 'platform=macOS' \
  -skip-testing:CotabbyTests/FoundationModelDriftEvalTests CODE_SIGNING_ALLOWED=NO
# ** TEST SUCCEEDED **  Executed 607 tests, 3 skipped, 0 failures

Local Team ID caveat: app-hosted suites run with CODE_SIGNING_ALLOWED=NO.

Updated the two tests that pin these numbers: SuggestionModelValueTests.test_wordCountPresetsExposeMatchingPromptInstructionsAndTokenBudgets and SuggestionRequestFactoryTests.test_buildRequest_usesWordCountPresetForInstructionAndTokenBudget.

Linked issues

Refs #271 (the earlier completion-overrun work this continues).

Risk / rollout notes

Behavior change: inline suggestions get shorter across all four length presets. This is intentional and conservative; a clipped suggestion is cheaper than one that blows past the chosen length.
Product default change: SuggestionConfiguration.standard.maxPredictionTokens 8 → 5. It is read only as the per-request lower bound, so the only behavioral effect is that the 2-4 preset is no longer floored up to 8.
No settings, schema, or pbxproj migration (existing files only; no new sources).

🤖 Generated with Claude Code

Greptile Summary

This PR reduces per-preset token budgets from a ~1.5x to a ~1.25x multiplier of each preset's upper word bound, and lowers the SuggestionConfiguration.standard floor from 8 to 5 so it no longer silently overrides the smallest (2-4 words) preset. The math checks out in all four cases (4×1.25=5, 7×1.25≈9, 12×1.25=15, 20×1.25=25), the BPE-based rationale in the comment is sound, and the activeMaxPredictionTokens floor logic (max(config.maxPredictionTokens, presetBudget)) now correctly resolves to 5 for the smallest preset instead of being raised to 8.

All four suggestedPredictionTokenBudget values are reduced, and both affected test files are updated to match — no stale assertions remain.
The product default maxPredictionTokens drop (8 → 5) is scoped exclusively to the per-request floor; it does not affect the larger presets, and the multi-line path (base × 2, capped at 60) is untouched.

Confidence Score: 5/5

Safe to merge — the change is a targeted numeric adjustment with clear rationale, consistent test coverage, and no structural modifications.

All four budget values match their stated 1.25x multiplier, the floor value now equals the smallest preset budget (so max(5, 5) no longer raises it), and both test files are updated in lockstep. The activeMaxPredictionTokens logic in SuggestionRequestFactory is untouched and behaves correctly with the new floor.

No files require special attention. The only file with non-trivial logic (SuggestionRequestFactory.swift) was read to verify the floor behavior and is not modified by this PR.

Important Files Changed

Filename	Overview
Cotabby/Models/SuggestionModels.swift	Token budgets reduced to ~1.25x the upper word bound (5/9/15/25) and the config floor lowered from 8 to 5; comment updated to explain the BPE-based rationale.
CotabbyTests/ModelAndPresentationValueTests.swift	Four token-budget assertions updated to match the new preset values (6→5, 11→9, 18→15, 30→25); all other test logic unchanged.
CotabbyTests/SuggestionRequestFactoryTests.swift	Single assertion updated for the twelveToTwenty preset (30→25); the configuration fixture in this test uses maxPredictionTokens: 1 as the floor, correctly exercising max(1, 25) = 25.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User selects word-count preset] --> B{selectedWordCountPreset}
    B --> |twoToFour| C[budget = 5]
    B --> |fourToSeven| D[budget = 9]
    B --> |sevenToTwelve| E[budget = 15]
    B --> |twelveToTwenty| F[budget = 25]
    C & D & E & F --> G["activeMaxPredictionTokens()\nbase = max(config.maxPredictionTokens=5, presetBudget)"]
    G --> H{isMultiLineEnabled?}
    H --> |No| I[maxPredictionTokens = base]
    H --> |Yes| J["maxPredictionTokens = min(base × 2, 60)"]
    I & J --> K[SuggestionRequest.maxPredictionTokens]
    K --> L[Local model hard-stops generation at this token count]

_{Reviews (1): Last reviewed commit: "Bump suggestion token budgets down to tr..." | Re-trigger Greptile}

The per-preset token budget governs completion length on the local model (the in-prompt word-range cue was removed). It was sized at ~1.5x the upper word bound, but English BPE averages ~1.3 tokens/word, so every preset could overrun its word range. The 2-4 preset was worst: its budget of 6 was floored up to 8 by SuggestionConfiguration.standard.maxPredictionTokens, allowing roughly 6 words for a "2-4 words" setting. Resize budgets to ~1.25x the upper bound (5/9/15/25) so the cap lands at or just under the upper word bound, and lower the config floor from 8 to 5 (it is read only as the lower bound in activeMaxPredictionTokens) so the smallest preset can deliver its length instead of being silently raised. Conservative by design: a clipped suggestion is cheaper than one that blows past the chosen length.

FuJacob merged commit 0b16e97 into main Jun 1, 2026
4 checks passed

FuJacob deleted the fix/conservative-token-budgets branch June 1, 2026 04:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bump suggestion token budgets down to track word presets#493

Bump suggestion token budgets down to track word presets#493
FuJacob merged 1 commit into
mainfrom
fix/conservative-token-budgets

FuJacob commented Jun 1, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FuJacob commented Jun 1, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Linked issues

Risk / rollout notes

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FuJacob commented Jun 1, 2026 •

edited by greptile-apps Bot

Loading