Skip to content

Bump suggestion token budgets down to track word presets#493

Merged
FuJacob merged 1 commit into
mainfrom
fix/conservative-token-budgets
Jun 1, 2026
Merged

Bump suggestion token budgets down to track word presets#493
FuJacob merged 1 commit into
mainfrom
fix/conservative-token-budgets

Conversation

@FuJacob
Copy link
Copy Markdown
Owner

@FuJacob FuJacob commented Jun 1, 2026

Summary

The per-preset token budget is the sole governor of completion length on the local model (the in-prompt word-range cue was removed). It was sized at ~1.5x the upper word bound, but English BPE averages ~1.3 tokens/word, so every preset could overrun its stated word range. The shortest preset was worst: the 2-4 words budget of 6 was floored up to 8 by SuggestionConfiguration.standard.maxPredictionTokens, allowing roughly 6 words for a "2-4 words" setting.

This resizes budgets to ~1.25x the upper bound so the cap lands at or just under the upper word bound, and lowers the config floor so the smallest preset can actually deliver its length:

Preset Upper bound Budget (before → after) Effective before* Effective after
2-4 4 6 → 5 8 (floored) 5
4-7 7 11 → 9 11 9
7-12 12 18 → 15 18 15
12-20 20 30 → 25 30 25

* Effective = max(SuggestionConfiguration.standard.maxPredictionTokens, presetBudget). The floor (read only in SuggestionRequestFactory.activeMaxPredictionTokens) is lowered from 8 to 5 so it never silently raises the smallest preset above its budget.

Validation

swiftlint lint --strict --quiet        # exit 0 (lints Cotabby/, per included: in .swiftlint.yml)

xcodebuild test -project Cotabby.xcodeproj -scheme Cotabby \
  -destination 'platform=macOS' \
  -skip-testing:CotabbyTests/FoundationModelDriftEvalTests CODE_SIGNING_ALLOWED=NO
# ** TEST SUCCEEDED **  Executed 607 tests, 3 skipped, 0 failures

Local Team ID caveat: app-hosted suites run with CODE_SIGNING_ALLOWED=NO.

Updated the two tests that pin these numbers: SuggestionModelValueTests.test_wordCountPresetsExposeMatchingPromptInstructionsAndTokenBudgets and SuggestionRequestFactoryTests.test_buildRequest_usesWordCountPresetForInstructionAndTokenBudget.

Linked issues

Refs #271 (the earlier completion-overrun work this continues).

Risk / rollout notes

  • Behavior change: inline suggestions get shorter across all four length presets. This is intentional and conservative; a clipped suggestion is cheaper than one that blows past the chosen length.
  • Product default change: SuggestionConfiguration.standard.maxPredictionTokens 8 → 5. It is read only as the per-request lower bound, so the only behavioral effect is that the 2-4 preset is no longer floored up to 8.
  • No settings, schema, or pbxproj migration (existing files only; no new sources).

🤖 Generated with Claude Code

Greptile Summary

This PR reduces per-preset token budgets from a ~1.5x to a ~1.25x multiplier of each preset's upper word bound, and lowers the SuggestionConfiguration.standard floor from 8 to 5 so it no longer silently overrides the smallest (2-4 words) preset. The math checks out in all four cases (4×1.25=5, 7×1.25≈9, 12×1.25=15, 20×1.25=25), the BPE-based rationale in the comment is sound, and the activeMaxPredictionTokens floor logic (max(config.maxPredictionTokens, presetBudget)) now correctly resolves to 5 for the smallest preset instead of being raised to 8.

  • All four suggestedPredictionTokenBudget values are reduced, and both affected test files are updated to match — no stale assertions remain.
  • The product default maxPredictionTokens drop (8 → 5) is scoped exclusively to the per-request floor; it does not affect the larger presets, and the multi-line path (base × 2, capped at 60) is untouched.

Confidence Score: 5/5

Safe to merge — the change is a targeted numeric adjustment with clear rationale, consistent test coverage, and no structural modifications.

All four budget values match their stated 1.25x multiplier, the floor value now equals the smallest preset budget (so max(5, 5) no longer raises it), and both test files are updated in lockstep. The activeMaxPredictionTokens logic in SuggestionRequestFactory is untouched and behaves correctly with the new floor.

No files require special attention. The only file with non-trivial logic (SuggestionRequestFactory.swift) was read to verify the floor behavior and is not modified by this PR.

Important Files Changed

Filename Overview
Cotabby/Models/SuggestionModels.swift Token budgets reduced to ~1.25x the upper word bound (5/9/15/25) and the config floor lowered from 8 to 5; comment updated to explain the BPE-based rationale.
CotabbyTests/ModelAndPresentationValueTests.swift Four token-budget assertions updated to match the new preset values (6→5, 11→9, 18→15, 30→25); all other test logic unchanged.
CotabbyTests/SuggestionRequestFactoryTests.swift Single assertion updated for the twelveToTwenty preset (30→25); the configuration fixture in this test uses maxPredictionTokens: 1 as the floor, correctly exercising max(1, 25) = 25.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User selects word-count preset] --> B{selectedWordCountPreset}
    B --> |twoToFour| C[budget = 5]
    B --> |fourToSeven| D[budget = 9]
    B --> |sevenToTwelve| E[budget = 15]
    B --> |twelveToTwenty| F[budget = 25]
    C & D & E & F --> G["activeMaxPredictionTokens()\nbase = max(config.maxPredictionTokens=5, presetBudget)"]
    G --> H{isMultiLineEnabled?}
    H --> |No| I[maxPredictionTokens = base]
    H --> |Yes| J["maxPredictionTokens = min(base × 2, 60)"]
    I & J --> K[SuggestionRequest.maxPredictionTokens]
    K --> L[Local model hard-stops generation at this token count]
Loading

Reviews (1): Last reviewed commit: "Bump suggestion token budgets down to tr..." | Re-trigger Greptile

The per-preset token budget governs completion length on the local model
(the in-prompt word-range cue was removed). It was sized at ~1.5x the upper
word bound, but English BPE averages ~1.3 tokens/word, so every preset could
overrun its word range. The 2-4 preset was worst: its budget of 6 was floored
up to 8 by SuggestionConfiguration.standard.maxPredictionTokens, allowing
roughly 6 words for a "2-4 words" setting.

Resize budgets to ~1.25x the upper bound (5/9/15/25) so the cap lands at or
just under the upper word bound, and lower the config floor from 8 to 5 (it is
read only as the lower bound in activeMaxPredictionTokens) so the smallest
preset can deliver its length instead of being silently raised.

Conservative by design: a clipped suggestion is cheaper than one that blows
past the chosen length.
@FuJacob FuJacob merged commit 0b16e97 into main Jun 1, 2026
4 checks passed
@FuJacob FuJacob deleted the fix/conservative-token-budgets branch June 1, 2026 04:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant