Conversation
…ints - Extend retrieve_chunks with exclude_notebook_id and source_id filters - writing-context: use retrieve_chunks + notebook ownership (404) - related-knowledge: hybrid RAG; 404 if notebook not owned - source suggestions: semantic retrieve within source; owner-scoped source query - summarize fallback: order chunks by created_at desc instead of random - send_message: parallel graph_augmented_context; global conv uses global_search RAG Made-with: Cursor
- Unit: compose_answer prepends extra_graph_context into reference message (mock llm.chat) - HTTP: writing-context / related-knowledge 404 + short-text path (PostgreSQL only; skip on SQLite JSONB) - Integration: send_message passes graph to compose; global conv uses global_search (PostgreSQL only) Made-with: Cursor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 Walkthrough功能概览该PR扩展RAG检索系统,在 变更概览
序列图sequenceDiagram
participant Client
participant ConversationService as ConversationService<br/>.send_message
participant RetrieveChunks as retrieve_chunks
participant GraphContext as graph_augmented_context
participant ComposeAnswer as compose_answer
participant LLM
Client->>ConversationService: send_message(notebook_id, message)
par Parallel Retrieval
ConversationService->>RetrieveChunks: retrieve_chunks(user_id, notebook_id)
RetrieveChunks-->>ConversationService: chunks list
and Graph Context
ConversationService->>GraphContext: graph_augmented_context(notebook_id)
GraphContext-->>ConversationService: graph_context string
end
ConversationService->>ComposeAnswer: compose_answer(chunks, extra_graph_context, message)
ComposeAnswer->>LLM: chat(messages with context + graph)
LLM-->>ComposeAnswer: assistant reply
ComposeAnswer-->>ConversationService: composed answer
ConversationService-->>Client: response
代码审查工作量估算🎯 4 (复杂) | ⏱️ ~45 分钟 可能相关的PR
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
apps/api/app/domains/ai/routers/knowledge.py (1)
74-76: 注意:if c.get("chunk_id") in meta可能静默过滤掉有效结果。当
retrieve_chunks返回的 chunk 在后续 metadata 查询中找不到对应记录时(例如并发删除),该 chunk 会被静默排除。这是防御性的处理,但可能导致返回结果少于预期。如果需要更好的可观测性,可以考虑记录被过滤的 chunk 数量。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/api/app/domains/ai/routers/knowledge.py` around lines 74 - 76, The list comprehension filtering raw chunks using "if c.get('chunk_id') in meta" can silently drop valid chunks (e.g., due to concurrent deletes); update the code in the same block where "raw" is filtered (the comprehension that checks c.get('chunk_id')) to compute and log the number (and optionally IDs) of chunks that were filtered out so we have observability into dropped chunks from retrieve_chunks -> meta mismatches; include references to "raw", "meta", and "chunk_id" in the log message so it's clear which items were excluded.apps/api/tests/unit/test_retrieve_routers_auth.py (1)
11-15: 布尔比较风格建议:使用not替代is False。当前
is False虽然有效,但不够 Pythonic。建议使用更清晰的写法:♻️ 建议的修改
-requires_postgres = os.environ.get("DATABASE_URL", "sqlite").startswith("sqlite") is False +requires_postgres = not os.environ.get("DATABASE_URL", "sqlite").startswith("sqlite")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/api/tests/unit/test_retrieve_routers_auth.py` around lines 11 - 15, The boolean comparison in requires_postgres is using "is False" which is unpythonic; change the expression that determines requires_postgres (currently: requires_postgres = os.environ.get("DATABASE_URL", "sqlite").startswith("sqlite") is False) to use "not" for clarity and correctness (e.g., set requires_postgres = not os.environ.get("DATABASE_URL", "sqlite").startswith("sqlite")), leaving pytestmark and the DATABASE_URL usage unchanged.apps/api/app/agents/rag/retrieval.py (1)
212-215: 建议为Chunk.notebook_id和Chunk.source_id添加数据库索引。根据
apps/api/app/models/source.py:38-59的模型定义,Chunk表的notebook_id和source_id字段未声明index=True。新增的过滤条件(此处及 Lines 145-148)在大数据量下可能导致全表扫描。建议的模型修改
# In apps/api/app/models/source.py source_id: Mapped[uuid.UUID | None] = mapped_column( - ForeignKey("sources.id", ondelete="CASCADE"), nullable=True + ForeignKey("sources.id", ondelete="CASCADE"), nullable=True, index=True ) -notebook_id: Mapped[uuid.UUID] = mapped_column(ForeignKey("notebooks.id", ondelete="CASCADE")) +notebook_id: Mapped[uuid.UUID] = mapped_column(ForeignKey("notebooks.id", ondelete="CASCADE"), index=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/api/app/agents/rag/retrieval.py` around lines 212 - 215, The query filters use Chunk.notebook_id and Chunk.source_id which are not indexed, causing potential full-table scans; update the Chunk model to declare indexes on these columns (set index=True on the notebook_id and source_id Column definitions in the Chunk model) and then create and apply a DB migration (e.g., Alembic) to add the corresponding indexes to the database so the where clauses in functions using Chunk.notebook_id and Chunk.source_id will use indexes.apps/api/app/services/conversation_service.py (1)
175-177: 全局会话下建议跳过 notebook 摘要查询Line 176 在
conv.notebook_id is None时仍执行get_notebook_summary,会产生一次无意义查询。建议按分支懒加载,和stream_agent保持一致。建议修改
- notebook_summary = await get_notebook_summary(conv.notebook_id, self.db) + notebook_summary = ( + await get_notebook_summary(conv.notebook_id, self.db) + if conv.notebook_id + else None + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/api/app/services/conversation_service.py` around lines 175 - 177, 当前实现无论 conv.notebook_id 是否为 None 都会调用 get_notebook_summary,导致全局会话时产生多余查询;将对 notebook 摘要的调用按需懒加载:把 await get_notebook_summary(conv.notebook_id, self.db) 的调用移到 if conv.notebook_id: 分支内部(与 stream_agent 的处理方式一致),仅在 conv.notebook_id 为真时才执行,并保留 user_memories 的获取不变;定位符参考 get_user_memories、get_notebook_summary、conv.notebook_id 和 stream_agent。
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@apps/api/app/agents/rag/retrieval.py`:
- Around line 212-215: The query filters use Chunk.notebook_id and
Chunk.source_id which are not indexed, causing potential full-table scans;
update the Chunk model to declare indexes on these columns (set index=True on
the notebook_id and source_id Column definitions in the Chunk model) and then
create and apply a DB migration (e.g., Alembic) to add the corresponding indexes
to the database so the where clauses in functions using Chunk.notebook_id and
Chunk.source_id will use indexes.
In `@apps/api/app/domains/ai/routers/knowledge.py`:
- Around line 74-76: The list comprehension filtering raw chunks using "if
c.get('chunk_id') in meta" can silently drop valid chunks (e.g., due to
concurrent deletes); update the code in the same block where "raw" is filtered
(the comprehension that checks c.get('chunk_id')) to compute and log the number
(and optionally IDs) of chunks that were filtered out so we have observability
into dropped chunks from retrieve_chunks -> meta mismatches; include references
to "raw", "meta", and "chunk_id" in the log message so it's clear which items
were excluded.
In `@apps/api/app/services/conversation_service.py`:
- Around line 175-177: 当前实现无论 conv.notebook_id 是否为 None 都会调用
get_notebook_summary,导致全局会话时产生多余查询;将对 notebook 摘要的调用按需懒加载:把 await
get_notebook_summary(conv.notebook_id, self.db) 的调用移到 if conv.notebook_id:
分支内部(与 stream_agent 的处理方式一致),仅在 conv.notebook_id 为真时才执行,并保留 user_memories
的获取不变;定位符参考 get_user_memories、get_notebook_summary、conv.notebook_id 和
stream_agent。
In `@apps/api/tests/unit/test_retrieve_routers_auth.py`:
- Around line 11-15: The boolean comparison in requires_postgres is using "is
False" which is unpythonic; change the expression that determines
requires_postgres (currently: requires_postgres = os.environ.get("DATABASE_URL",
"sqlite").startswith("sqlite") is False) to use "not" for clarity and
correctness (e.g., set requires_postgres = not os.environ.get("DATABASE_URL",
"sqlite").startswith("sqlite")), leaving pytestmark and the DATABASE_URL usage
unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ee4cb79d-b1f0-4a40-ac33-8f9866f3ee05
📒 Files selected for processing (10)
apps/api/app/agents/rag/retrieval.pyapps/api/app/agents/writing/composer.pyapps/api/app/domains/ai/routers/knowledge.pyapps/api/app/domains/ai/routers/suggestions.pyapps/api/app/domains/ai/routers/writing.pyapps/api/app/services/conversation_service.pyapps/api/app/skills/builtin/summarize.pyapps/api/tests/integration/test_conversation_service.pyapps/api/tests/unit/test_compose_answer_extra_graph.pyapps/api/tests/unit/test_retrieve_routers_auth.py
Summary
Type of change
Related issue
Closes #
Changes
How to test
Screenshots (if applicable)
Checklist
./lyra lintand there are no type errorsfeat:,fix:, etc.)