Bump suggestion token budgets down to track word presets#493
Merged
Conversation
The per-preset token budget governs completion length on the local model (the in-prompt word-range cue was removed). It was sized at ~1.5x the upper word bound, but English BPE averages ~1.3 tokens/word, so every preset could overrun its word range. The 2-4 preset was worst: its budget of 6 was floored up to 8 by SuggestionConfiguration.standard.maxPredictionTokens, allowing roughly 6 words for a "2-4 words" setting. Resize budgets to ~1.25x the upper bound (5/9/15/25) so the cap lands at or just under the upper word bound, and lower the config floor from 8 to 5 (it is read only as the lower bound in activeMaxPredictionTokens) so the smallest preset can deliver its length instead of being silently raised. Conservative by design: a clipped suggestion is cheaper than one that blows past the chosen length.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The per-preset token budget is the sole governor of completion length on the local model (the in-prompt word-range cue was removed). It was sized at ~1.5x the upper word bound, but English BPE averages ~1.3 tokens/word, so every preset could overrun its stated word range. The shortest preset was worst: the
2-4 wordsbudget of 6 was floored up to 8 bySuggestionConfiguration.standard.maxPredictionTokens, allowing roughly 6 words for a "2-4 words" setting.This resizes budgets to ~1.25x the upper bound so the cap lands at or just under the upper word bound, and lowers the config floor so the smallest preset can actually deliver its length:
* Effective =
max(SuggestionConfiguration.standard.maxPredictionTokens, presetBudget). The floor (read only inSuggestionRequestFactory.activeMaxPredictionTokens) is lowered from 8 to 5 so it never silently raises the smallest preset above its budget.Validation
Local Team ID caveat: app-hosted suites run with
CODE_SIGNING_ALLOWED=NO.Updated the two tests that pin these numbers:
SuggestionModelValueTests.test_wordCountPresetsExposeMatchingPromptInstructionsAndTokenBudgetsandSuggestionRequestFactoryTests.test_buildRequest_usesWordCountPresetForInstructionAndTokenBudget.Linked issues
Refs #271 (the earlier completion-overrun work this continues).
Risk / rollout notes
SuggestionConfiguration.standard.maxPredictionTokens8 → 5. It is read only as the per-request lower bound, so the only behavioral effect is that the2-4preset is no longer floored up to 8.🤖 Generated with Claude Code
Greptile Summary
This PR reduces per-preset token budgets from a ~1.5x to a ~1.25x multiplier of each preset's upper word bound, and lowers the
SuggestionConfiguration.standardfloor from 8 to 5 so it no longer silently overrides the smallest (2-4 words) preset. The math checks out in all four cases (4×1.25=5, 7×1.25≈9, 12×1.25=15, 20×1.25=25), the BPE-based rationale in the comment is sound, and theactiveMaxPredictionTokensfloor logic (max(config.maxPredictionTokens, presetBudget)) now correctly resolves to 5 for the smallest preset instead of being raised to 8.suggestedPredictionTokenBudgetvalues are reduced, and both affected test files are updated to match — no stale assertions remain.maxPredictionTokensdrop (8 → 5) is scoped exclusively to the per-request floor; it does not affect the larger presets, and the multi-line path (base × 2, capped at 60) is untouched.Confidence Score: 5/5
Safe to merge — the change is a targeted numeric adjustment with clear rationale, consistent test coverage, and no structural modifications.
All four budget values match their stated 1.25x multiplier, the floor value now equals the smallest preset budget (so max(5, 5) no longer raises it), and both test files are updated in lockstep. The activeMaxPredictionTokens logic in SuggestionRequestFactory is untouched and behaves correctly with the new floor.
No files require special attention. The only file with non-trivial logic (SuggestionRequestFactory.swift) was read to verify the floor behavior and is not modified by this PR.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[User selects word-count preset] --> B{selectedWordCountPreset} B --> |twoToFour| C[budget = 5] B --> |fourToSeven| D[budget = 9] B --> |sevenToTwelve| E[budget = 15] B --> |twelveToTwenty| F[budget = 25] C & D & E & F --> G["activeMaxPredictionTokens()\nbase = max(config.maxPredictionTokens=5, presetBudget)"] G --> H{isMultiLineEnabled?} H --> |No| I[maxPredictionTokens = base] H --> |Yes| J["maxPredictionTokens = min(base × 2, 60)"] I & J --> K[SuggestionRequest.maxPredictionTokens] K --> L[Local model hard-stops generation at this token count]Reviews (1): Last reviewed commit: "Bump suggestion token budgets down to tr..." | Re-trigger Greptile