Skip to content

Commit f8c4740

Browse files
committed
fix: add repetition_penalty to Qwen 0.8B to prevent looping text generation
1 parent c8059a0 commit f8c4740

2 files changed

Lines changed: 19 additions & 4 deletions

File tree

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Fix AI Text Repetition (Qwen 0.8B)
2+
3+
## Overview
4+
Fixed repetitive/looping text generation when using `{{@AI:}}` tags with the Qwen 3.5 0.8B model. Stories and other generated content would enter infinite repetition loops due to missing anti-repetition parameters.
5+
6+
## Root Cause
7+
The main `ai-worker.js` lacked `repetition_penalty` and `no_repeat_ngram_size` in its generation config, while all other workers (Florence, Docling, GLM-OCR) already used `repetition_penalty: 1.2–1.5`. Small models like Qwen 0.8B are especially prone to degenerate repetition without these parameters.
8+
9+
## Changes
10+
- **`public/ai-worker.js`** — Added anti-repetition parameters to both generation paths (multimodal + text-only):
11+
- Non-thinking mode: `repetition_penalty: 1.3`, `no_repeat_ngram_size: 5`
12+
- Thinking mode: `repetition_penalty: 1.2`, `no_repeat_ngram_size: 4` (lower to avoid disrupting reasoning chains)
13+
14+
## Files Modified
15+
- `public/ai-worker.js` — Generation config for both multimodal and text-only paths

public/ai-worker.js

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -299,8 +299,8 @@ async function generate(taskType, context, userPrompt, messageId, enableThinking
299299

300300
// Generate — Qwen3 model card: use sampling, NOT greedy, for thinking mode
301301
const genConfig = enableThinking
302-
? { do_sample: true, temperature: 0.6, top_p: 0.95, top_k: 20, max_new_tokens: Math.max(maxTokens, 4096) }
303-
: { do_sample: true, temperature: 0.7, top_p: 0.8, top_k: 20, max_new_tokens: maxTokens };
302+
? { do_sample: true, temperature: 0.6, top_p: 0.95, top_k: 20, max_new_tokens: Math.max(maxTokens, 4096), repetition_penalty: 1.2, no_repeat_ngram_size: 4 }
303+
: { do_sample: true, temperature: 0.7, top_p: 0.8, top_k: 20, max_new_tokens: maxTokens, repetition_penalty: 1.3, no_repeat_ngram_size: 5 };
304304
await model.generate({ ...inputs, ...genConfig, streamer });
305305

306306
// Final cleanup — strip any remaining think tags or special tokens
@@ -389,8 +389,8 @@ async function generate(taskType, context, userPrompt, messageId, enableThinking
389389
// Generate — Qwen3 model card: use sampling, NOT greedy, for thinking mode
390390
// Thinking: temp=0.6, top_p=0.95, top_k=20 | Non-thinking: temp=0.7, top_p=0.8, top_k=20
391391
const genConfig = enableThinking
392-
? { do_sample: true, temperature: 0.6, top_p: 0.95, top_k: 20, max_new_tokens: Math.max(maxTokens, 4096) }
393-
: { do_sample: true, temperature: 0.7, top_p: 0.8, top_k: 20, max_new_tokens: maxTokens };
392+
? { do_sample: true, temperature: 0.6, top_p: 0.95, top_k: 20, max_new_tokens: Math.max(maxTokens, 4096), repetition_penalty: 1.2, no_repeat_ngram_size: 4 }
393+
: { do_sample: true, temperature: 0.7, top_p: 0.8, top_k: 20, max_new_tokens: maxTokens, repetition_penalty: 1.3, no_repeat_ngram_size: 5 };
394394
await model.generate({ ...inputs, ...genConfig, streamer });
395395

396396
// Final cleanup: strip any remaining think tags or reasoning artifacts

0 commit comments

Comments
 (0)