Skip to content

Feat/api retrieve unification#38

Merged
2 commits merged intomainfrom
feat/api-retrieve-unification
Mar 27, 2026
Merged

Feat/api retrieve unification#38
2 commits merged intomainfrom
feat/api-retrieve-unification

Conversation

@LinMoQC
Copy link
Copy Markdown
Owner

@LinMoQC LinMoQC commented Mar 27, 2026

Summary

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactor / code cleanup
  • CI / tooling

Related issue

Closes #

Changes

How to test

Screenshots (if applicable)

Checklist

  • My code follows the project's coding conventions
  • I have run ./lyra lint and there are no type errors
  • I have added/updated tests for the changed functionality
  • I have updated the documentation if behavior changed
  • The PR title follows Conventional Commits format (feat:, fix:, etc.)
  • I have read the CONTRIBUTING.md

LinMoQC added 2 commits March 27, 2026 12:02
…ints

- Extend retrieve_chunks with exclude_notebook_id and source_id filters
- writing-context: use retrieve_chunks + notebook ownership (404)
- related-knowledge: hybrid RAG; 404 if notebook not owned
- source suggestions: semantic retrieve within source; owner-scoped source query
- summarize fallback: order chunks by created_at desc instead of random
- send_message: parallel graph_augmented_context; global conv uses global_search RAG

Made-with: Cursor
- Unit: compose_answer prepends extra_graph_context into reference message (mock llm.chat)
- HTTP: writing-context / related-knowledge 404 + short-text path (PostgreSQL only; skip on SQLite JSONB)
- Integration: send_message passes graph to compose; global conv uses global_search (PostgreSQL only)

Made-with: Cursor
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
lyra-note-web Ready Ready Preview, Comment Mar 27, 2026 4:12am

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 27, 2026

📝 Walkthrough

功能概览

该PR扩展RAG检索系统,在retrieve_chunks中添加了笔记本和来源过滤参数,在答案组成中支持图谱上下文,并将多个路由的数据获取从直接SQL查询迁移至统一的RAG检索接口。

变更概览

分组 / 文件 摘要
RAG检索核心更新
apps/api/app/agents/rag/retrieval.py
添加了_uuid_or_none辅助函数和exclude_notebook_idsource_id关键字参数,用于在向量搜索和全文搜索中过滤结果。
答案组成
apps/api/app/agents/writing/composer.py
添加了extra_graph_context参数,在生成的上下文中条件性地插入"结构化知识关联(图谱)"章节。
路由层迁移
apps/api/app/domains/ai/routers/knowledge.py, suggestions.py, writing.py
将三个路由从直接嵌入+SQL相似度查询迁移至retrieve_chunks接口,添加了笔记本所有权校验和404异常处理。
对话服务集成
apps/api/app/services/conversation_service.py
send_message中并发调用retrieve_chunksgraph_augmented_context,将图谱上下文传递给compose_answer
排序优化
apps/api/app/skills/builtin/summarize.py
将分块选择的降序排序从随机改为按创建时间降序。
集成测试
apps/api/tests/integration/test_conversation_service.py
添加了TestConversationSendMessageRag测试类,验证notebook范围和全局对话中的RAG+图谱上下文流程。
单元测试
apps/api/tests/unit/test_compose_answer_extra_graph.py, test_retrieve_routers_auth.py
添加了两个新测试模块:验证图谱章节的插入以及路由的notebook所有权校验和404处理。

序列图

sequenceDiagram
    participant Client
    participant ConversationService as ConversationService<br/>.send_message
    participant RetrieveChunks as retrieve_chunks
    participant GraphContext as graph_augmented_context
    participant ComposeAnswer as compose_answer
    participant LLM

    Client->>ConversationService: send_message(notebook_id, message)
    
    par Parallel Retrieval
        ConversationService->>RetrieveChunks: retrieve_chunks(user_id, notebook_id)
        RetrieveChunks-->>ConversationService: chunks list
    and Graph Context
        ConversationService->>GraphContext: graph_augmented_context(notebook_id)
        GraphContext-->>ConversationService: graph_context string
    end
    
    ConversationService->>ComposeAnswer: compose_answer(chunks, extra_graph_context, message)
    ComposeAnswer->>LLM: chat(messages with context + graph)
    LLM-->>ComposeAnswer: assistant reply
    ComposeAnswer-->>ConversationService: composed answer
    ConversationService-->>Client: response
Loading

代码审查工作量估算

🎯 4 (复杂) | ⏱️ ~45 分钟

可能相关的PR

  • Feature/graphrag retrieval augmentation #24:同样集成了图谱增强上下文到RAG流程中,修改了调用retrieve_chunkscompose_answer的代码路径,与本PR在RAG管道中的图谱上下文集成方向一致。
🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning PR description is a template with no actual content filled in; summary, changes, testing steps, and issue references are all blank. Fill in the PR description template with actual summary, key changes, testing steps, and related issue number to provide meaningful context for reviewers.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed 标题简洁且与变更集相关,准确反映了检索逻辑统一的主要目标。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/api-retrieve-unification

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
apps/api/app/domains/ai/routers/knowledge.py (1)

74-76: 注意:if c.get("chunk_id") in meta 可能静默过滤掉有效结果。

retrieve_chunks 返回的 chunk 在后续 metadata 查询中找不到对应记录时(例如并发删除),该 chunk 会被静默排除。这是防御性的处理,但可能导致返回结果少于预期。

如果需要更好的可观测性,可以考虑记录被过滤的 chunk 数量。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/app/domains/ai/routers/knowledge.py` around lines 74 - 76, The list
comprehension filtering raw chunks using "if c.get('chunk_id') in meta" can
silently drop valid chunks (e.g., due to concurrent deletes); update the code in
the same block where "raw" is filtered (the comprehension that checks
c.get('chunk_id')) to compute and log the number (and optionally IDs) of chunks
that were filtered out so we have observability into dropped chunks from
retrieve_chunks -> meta mismatches; include references to "raw", "meta", and
"chunk_id" in the log message so it's clear which items were excluded.
apps/api/tests/unit/test_retrieve_routers_auth.py (1)

11-15: 布尔比较风格建议:使用 not 替代 is False

当前 is False 虽然有效,但不够 Pythonic。建议使用更清晰的写法:

♻️ 建议的修改
-requires_postgres = os.environ.get("DATABASE_URL", "sqlite").startswith("sqlite") is False
+requires_postgres = not os.environ.get("DATABASE_URL", "sqlite").startswith("sqlite")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/tests/unit/test_retrieve_routers_auth.py` around lines 11 - 15, The
boolean comparison in requires_postgres is using "is False" which is unpythonic;
change the expression that determines requires_postgres (currently:
requires_postgres = os.environ.get("DATABASE_URL",
"sqlite").startswith("sqlite") is False) to use "not" for clarity and
correctness (e.g., set requires_postgres = not os.environ.get("DATABASE_URL",
"sqlite").startswith("sqlite")), leaving pytestmark and the DATABASE_URL usage
unchanged.
apps/api/app/agents/rag/retrieval.py (1)

212-215: 建议为 Chunk.notebook_idChunk.source_id 添加数据库索引。

根据 apps/api/app/models/source.py:38-59 的模型定义,Chunk 表的 notebook_idsource_id 字段未声明 index=True。新增的过滤条件(此处及 Lines 145-148)在大数据量下可能导致全表扫描。

建议的模型修改
# In apps/api/app/models/source.py
 source_id: Mapped[uuid.UUID | None] = mapped_column(
-    ForeignKey("sources.id", ondelete="CASCADE"), nullable=True
+    ForeignKey("sources.id", ondelete="CASCADE"), nullable=True, index=True
 )
-notebook_id: Mapped[uuid.UUID] = mapped_column(ForeignKey("notebooks.id", ondelete="CASCADE"))
+notebook_id: Mapped[uuid.UUID] = mapped_column(ForeignKey("notebooks.id", ondelete="CASCADE"), index=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/app/agents/rag/retrieval.py` around lines 212 - 215, The query
filters use Chunk.notebook_id and Chunk.source_id which are not indexed, causing
potential full-table scans; update the Chunk model to declare indexes on these
columns (set index=True on the notebook_id and source_id Column definitions in
the Chunk model) and then create and apply a DB migration (e.g., Alembic) to add
the corresponding indexes to the database so the where clauses in functions
using Chunk.notebook_id and Chunk.source_id will use indexes.
apps/api/app/services/conversation_service.py (1)

175-177: 全局会话下建议跳过 notebook 摘要查询

Line 176 在 conv.notebook_id is None 时仍执行 get_notebook_summary,会产生一次无意义查询。建议按分支懒加载,和 stream_agent 保持一致。

建议修改
-        notebook_summary = await get_notebook_summary(conv.notebook_id, self.db)
+        notebook_summary = (
+            await get_notebook_summary(conv.notebook_id, self.db)
+            if conv.notebook_id
+            else None
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/app/services/conversation_service.py` around lines 175 - 177, 当前实现无论
conv.notebook_id 是否为 None 都会调用 get_notebook_summary,导致全局会话时产生多余查询;将对 notebook
摘要的调用按需懒加载:把 await get_notebook_summary(conv.notebook_id, self.db) 的调用移到 if
conv.notebook_id: 分支内部(与 stream_agent 的处理方式一致),仅在 conv.notebook_id 为真时才执行,并保留
user_memories 的获取不变;定位符参考
get_user_memories、get_notebook_summary、conv.notebook_id 和 stream_agent。
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/api/app/agents/rag/retrieval.py`:
- Around line 212-215: The query filters use Chunk.notebook_id and
Chunk.source_id which are not indexed, causing potential full-table scans;
update the Chunk model to declare indexes on these columns (set index=True on
the notebook_id and source_id Column definitions in the Chunk model) and then
create and apply a DB migration (e.g., Alembic) to add the corresponding indexes
to the database so the where clauses in functions using Chunk.notebook_id and
Chunk.source_id will use indexes.

In `@apps/api/app/domains/ai/routers/knowledge.py`:
- Around line 74-76: The list comprehension filtering raw chunks using "if
c.get('chunk_id') in meta" can silently drop valid chunks (e.g., due to
concurrent deletes); update the code in the same block where "raw" is filtered
(the comprehension that checks c.get('chunk_id')) to compute and log the number
(and optionally IDs) of chunks that were filtered out so we have observability
into dropped chunks from retrieve_chunks -> meta mismatches; include references
to "raw", "meta", and "chunk_id" in the log message so it's clear which items
were excluded.

In `@apps/api/app/services/conversation_service.py`:
- Around line 175-177: 当前实现无论 conv.notebook_id 是否为 None 都会调用
get_notebook_summary,导致全局会话时产生多余查询;将对 notebook 摘要的调用按需懒加载:把 await
get_notebook_summary(conv.notebook_id, self.db) 的调用移到 if conv.notebook_id:
分支内部(与 stream_agent 的处理方式一致),仅在 conv.notebook_id 为真时才执行,并保留 user_memories
的获取不变;定位符参考 get_user_memories、get_notebook_summary、conv.notebook_id 和
stream_agent。

In `@apps/api/tests/unit/test_retrieve_routers_auth.py`:
- Around line 11-15: The boolean comparison in requires_postgres is using "is
False" which is unpythonic; change the expression that determines
requires_postgres (currently: requires_postgres = os.environ.get("DATABASE_URL",
"sqlite").startswith("sqlite") is False) to use "not" for clarity and
correctness (e.g., set requires_postgres = not os.environ.get("DATABASE_URL",
"sqlite").startswith("sqlite")), leaving pytestmark and the DATABASE_URL usage
unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ee4cb79d-b1f0-4a40-ac33-8f9866f3ee05

📥 Commits

Reviewing files that changed from the base of the PR and between 4ea3a23 and 1cc0cd2.

📒 Files selected for processing (10)
  • apps/api/app/agents/rag/retrieval.py
  • apps/api/app/agents/writing/composer.py
  • apps/api/app/domains/ai/routers/knowledge.py
  • apps/api/app/domains/ai/routers/suggestions.py
  • apps/api/app/domains/ai/routers/writing.py
  • apps/api/app/services/conversation_service.py
  • apps/api/app/skills/builtin/summarize.py
  • apps/api/tests/integration/test_conversation_service.py
  • apps/api/tests/unit/test_compose_answer_extra_graph.py
  • apps/api/tests/unit/test_retrieve_routers_auth.py

@LinMoQC LinMoQC closed this pull request by merging all changes into main in ad58f55 Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant