fix(backend): tighten action items extraction to reduce garbage tasks#6158
fix(backend): tighten action items extraction to reduce garbage tasks#6158
Conversation
…e tasks Revert rule 5 from permissive "Future Intent or Deadline" back to strict "NOT Already Being Done or About to Do Immediately" — stops extracting tasks for things the user is currently doing or about to do. Add single-topic dedup limit, real-time exchange exclusions, and stronger implicit-task default-to-nothing stance. Compress verbose date section from ~44 to ~11 lines so quality filtering rules carry more weight.
Greptile SummaryThis PR tightens the LLM prompt used for action-item extraction from conversations, reverting a December 2025 loosening of rule 5 that caused common "I'm going to X" phrasing to generate spurious tasks (e.g., fetching water producing 6 items). The change is purely a prompt edit inside Key changes:
Issues found:
Confidence Score: 4/5Safe to merge with the noted prompt inconsistencies being minor — the core regression fix is sound and well-motivated. All changes are confined to a prompt string; no Python runtime logic is altered. The two flagged issues are P2 prompt-quality gaps (ambiguous 'Today I will X' skip rule and missing explicit-request carve-out on SINGLE-TOPIC LIMIT) that could cause occasional LLM mis-decisions but will not crash or corrupt data. Score is 4 rather than 5 because the inconsistencies are in the primary decision rules of the prompt and could partially undermine the quality goals of this fix. backend/utils/llm/conversation_processing.py — pay attention to the SINGLE-TOPIC LIMIT dedup rule and the 'Today I will X' skip rule. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Conversation text] --> B{Explicit request?\nRemind me / Add task / etc.}
B -- Yes --> C[ALWAYS EXTRACT\nBypasses all other filters]
B -- No --> D{Implicit task?}
D --> E{Is user currently doing it\nor about to do it immediately?}
E -- Yes --> F[SKIP]
E -- No --> G{Being resolved in real-time\nbetween participants?}
G -- Yes --> F
G -- No --> H{Would a busy person\ngenuinely forget this?}
H -- No --> F
H -- Yes --> I{Passes ALL 5 strict\nfiltering rules?}
I -- No --> F
I -- Yes --> J{Duplicate or same topic\nas existing item?}
J -- Yes --> F
J -- No --> K[EXTRACT action item]
K --> L[Parse due date → UTC timestamp]
L --> M[Validate due_at is in future]
M --> N[Return ActionItem]
style C fill:#22c55e,color:#fff
style F fill:#ef4444,color:#fff
style N fill:#3b82f6,color:#fff
Reviews (1): Last reviewed commit: "fix(backend): tighten action items extra..." | Re-trigger Greptile |
| - "I'm going to X" → SKIP (about to do it right now) | ||
| - "I'll do X for you" → SKIP (immediate response to a request) | ||
| - "Let me X" → SKIP (taking action now) | ||
| - "Today I will X" → SKIP unless there's a specific time/deadline attached |
There was a problem hiding this comment.
"Today I will X → SKIP" rule is self-contradictory
The new rule reads:
"Today I will X" → SKIP unless there's a specific time/deadline attached
But "today" is a specific time/deadline. A statement like "Today I need to file my taxes" contains a concrete same-day deadline that is exactly the kind of forgettable task the system should capture. The rule as written would cause the LLM to skip it because "today" is the only time reference and isn't paired with an additional clock-time (e.g., "today by 5pm").
The previous rule explicitly called this scenario an EXTRACT: "Today, I want to complete the onboarding experience" → EXTRACT (stated goal with deadline).
Consider clarifying the intent — the goal seems to be to skip vague daily-habit statements ("Today I'll go to the gym") rather than actual deadline-anchored tasks ("Today I have to renew my insurance"). A cleaner phrasing might be:
"Today I will X" → SKIP unless it involves a hard commitment or forgettable deadline (e.g. filing, payments, submissions)
| - "Call dentist" (existing) vs "Call plumber" → NOT duplicate (different person/service) | ||
| - "Submit report by March 1st" (existing) vs "Submit report by March 15th" → NOT duplicate (different deadlines) | ||
| • If you're unsure whether something is a duplicate, err on the side of treating it as a duplicate (DON'T extract) | ||
| • SINGLE-TOPIC LIMIT: If a conversation discusses one topic, extract AT MOST 1 action item for it — not one per variation, option, or detail mentioned in the discussion. |
There was a problem hiding this comment.
SINGLE-TOPIC LIMIT conflicts with the ALWAYS-EXTRACT rule for explicit requests
The new dedup rule at this line says:
SINGLE-TOPIC LIMIT: If a conversation discusses one topic, extract AT MOST 1 action item for it
However, the EXPLICIT TASK/REMINDER REQUESTS section (lines 351–364) says these patterns ALWAYS extract regardless of other filters:
"Put X on my list" / "Add X to my tasks" → EXTRACT "X"
If a user explicitly asks for two reminders on the same general subject — e.g., "Remind me to call the dentist tomorrow" and "Add a task to pick up my prescription on Friday" — both are health-related and could be collapsed to one item by the SINGLE-TOPIC LIMIT, despite being distinct explicit requests.
The SINGLE-TOPIC LIMIT needs a carve-out (mirroring the NOTE: Skip this requirement if user explicitly asked for a reminder/task pattern used in rules 3 and 4):
| • SINGLE-TOPIC LIMIT: If a conversation discusses one topic, extract AT MOST 1 action item for it — not one per variation, option, or detail mentioned in the discussion. | |
| • SINGLE-TOPIC LIMIT: If a conversation discusses one topic, extract AT MOST 1 action item for it — not one per variation, option, or detail mentioned in the discussion. NOTE: Skip this limit if each item was explicitly requested by the user. |
Summary
Context
Users reported the system generates multiple garbage action items from casual conversations. Example: a 90-second exchange about getting water/soda from the kitchen was generating 6 items. The extraction prompt had accumulated permissive rules over time (loosened filtering, verbose date handling added in Mar
b4218f796diluting quality rules), and likely interacted with a gpt-5.1 model update to tip quality over the edge in recent weeks. This PR tightens the prompt to restore a strict quality bar.Test plan
🤖 Generated with Claude Code