Feat/api retrieve unification by LinMoQC · Pull Request #38 · LinMoQC/LyraNote

LinMoQC · 2026-03-27T04:11:51Z

Summary

Type of change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Refactor / code cleanup
CI / tooling

Related issue

Closes #

Changes

How to test

Screenshots (if applicable)

Checklist

My code follows the project's coding conventions
I have run ./lyra lint and there are no type errors
I have added/updated tests for the changed functionality
I have updated the documentation if behavior changed
The PR title follows Conventional Commits format (feat:, fix:, etc.)
I have read the CONTRIBUTING.md

…ints - Extend retrieve_chunks with exclude_notebook_id and source_id filters - writing-context: use retrieve_chunks + notebook ownership (404) - related-knowledge: hybrid RAG; 404 if notebook not owned - source suggestions: semantic retrieve within source; owner-scoped source query - summarize fallback: order chunks by created_at desc instead of random - send_message: parallel graph_augmented_context; global conv uses global_search RAG Made-with: Cursor

- Unit: compose_answer prepends extra_graph_context into reference message (mock llm.chat) - HTTP: writing-context / related-knowledge 404 + short-text path (PostgreSQL only; skip on SQLite JSONB) - Integration: send_message passes graph to compose; global conv uses global_search (PostgreSQL only) Made-with: Cursor

vercel · 2026-03-27T04:11:55Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
lyra-note-web	Ready	Preview, Comment	Mar 27, 2026 4:12am

coderabbitai · 2026-03-27T04:12:03Z

📝 Walkthrough

功能概览

该PR扩展RAG检索系统，在retrieve_chunks中添加了笔记本和来源过滤参数，在答案组成中支持图谱上下文，并将多个路由的数据获取从直接SQL查询迁移至统一的RAG检索接口。

变更概览

分组 / 文件	摘要
RAG检索核心更新 `apps/api/app/agents/rag/retrieval.py`	添加了`_uuid_or_none`辅助函数和`exclude_notebook_id`、`source_id`关键字参数，用于在向量搜索和全文搜索中过滤结果。
答案组成 `apps/api/app/agents/writing/composer.py`	添加了`extra_graph_context`参数，在生成的上下文中条件性地插入"结构化知识关联（图谱）"章节。
路由层迁移 `apps/api/app/domains/ai/routers/knowledge.py`, `suggestions.py`, `writing.py`	将三个路由从直接嵌入+SQL相似度查询迁移至`retrieve_chunks`接口，添加了笔记本所有权校验和404异常处理。
对话服务集成 `apps/api/app/services/conversation_service.py`	在`send_message`中并发调用`retrieve_chunks`和`graph_augmented_context`，将图谱上下文传递给`compose_answer`。
排序优化 `apps/api/app/skills/builtin/summarize.py`	将分块选择的降序排序从随机改为按创建时间降序。
集成测试 `apps/api/tests/integration/test_conversation_service.py`	添加了`TestConversationSendMessageRag`测试类，验证notebook范围和全局对话中的RAG+图谱上下文流程。
单元测试 `apps/api/tests/unit/test_compose_answer_extra_graph.py`, `test_retrieve_routers_auth.py`	添加了两个新测试模块：验证图谱章节的插入以及路由的notebook所有权校验和404处理。

序列图

sequenceDiagram
    participant Client
    participant ConversationService as ConversationService<br/>.send_message
    participant RetrieveChunks as retrieve_chunks
    participant GraphContext as graph_augmented_context
    participant ComposeAnswer as compose_answer
    participant LLM

    Client->>ConversationService: send_message(notebook_id, message)
    
    par Parallel Retrieval
        ConversationService->>RetrieveChunks: retrieve_chunks(user_id, notebook_id)
        RetrieveChunks-->>ConversationService: chunks list
    and Graph Context
        ConversationService->>GraphContext: graph_augmented_context(notebook_id)
        GraphContext-->>ConversationService: graph_context string
    end
    
    ConversationService->>ComposeAnswer: compose_answer(chunks, extra_graph_context, message)
    ComposeAnswer->>LLM: chat(messages with context + graph)
    LLM-->>ComposeAnswer: assistant reply
    ComposeAnswer-->>ConversationService: composed answer
    ConversationService-->>Client: response

代码审查工作量估算

🎯 4 (复杂) | ⏱️ ~45 分钟

可能相关的PR

Feature/graphrag retrieval augmentation #24：同样集成了图谱增强上下文到RAG流程中，修改了调用retrieve_chunks和compose_answer的代码路径，与本PR在RAG管道中的图谱上下文集成方向一致。

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.83% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	PR description is a template with no actual content filled in; summary, changes, testing steps, and issue references are all blank.	Fill in the PR description template with actual summary, key changes, testing steps, and related issue number to provide meaningful context for reviewers.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	标题简洁且与变更集相关，准确反映了检索逻辑统一的主要目标。

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/api-retrieve-unification

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (4)

apps/api/app/domains/ai/routers/knowledge.py (1)

74-76: 注意：if c.get("chunk_id") in meta 可能静默过滤掉有效结果。

当 retrieve_chunks 返回的 chunk 在后续 metadata 查询中找不到对应记录时（例如并发删除），该 chunk 会被静默排除。这是防御性的处理，但可能导致返回结果少于预期。

如果需要更好的可观测性，可以考虑记录被过滤的 chunk 数量。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/app/domains/ai/routers/knowledge.py` around lines 74 - 76, The list
comprehension filtering raw chunks using "if c.get('chunk_id') in meta" can
silently drop valid chunks (e.g., due to concurrent deletes); update the code in
the same block where "raw" is filtered (the comprehension that checks
c.get('chunk_id')) to compute and log the number (and optionally IDs) of chunks
that were filtered out so we have observability into dropped chunks from
retrieve_chunks -> meta mismatches; include references to "raw", "meta", and
"chunk_id" in the log message so it's clear which items were excluded.

apps/api/tests/unit/test_retrieve_routers_auth.py (1)

11-15: 布尔比较风格建议：使用 not 替代 is False。

当前 is False 虽然有效，但不够 Pythonic。建议使用更清晰的写法：

♻️ 建议的修改

-requires_postgres = os.environ.get("DATABASE_URL", "sqlite").startswith("sqlite") is False
+requires_postgres = not os.environ.get("DATABASE_URL", "sqlite").startswith("sqlite")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/api/tests/unit/test_retrieve_routers_auth.py` around lines 11 - 15, The
boolean comparison in requires_postgres is using "is False" which is unpythonic;
change the expression that determines requires_postgres (currently:
requires_postgres = os.environ.get("DATABASE_URL",
"sqlite").startswith("sqlite") is False) to use "not" for clarity and
correctness (e.g., set requires_postgres = not os.environ.get("DATABASE_URL",
"sqlite").startswith("sqlite")), leaving pytestmark and the DATABASE_URL usage
unchanged.

apps/api/app/agents/rag/retrieval.py (1)

212-215: 建议为 Chunk.notebook_id 和 Chunk.source_id 添加数据库索引。

根据 apps/api/app/models/source.py:38-59 的模型定义，Chunk 表的 notebook_id 和 source_id 字段未声明 index=True。新增的过滤条件（此处及 Lines 145-148）在大数据量下可能导致全表扫描。

建议的模型修改

# In apps/api/app/models/source.py
 source_id: Mapped[uuid.UUID | None] = mapped_column(
-    ForeignKey("sources.id", ondelete="CASCADE"), nullable=True
+    ForeignKey("sources.id", ondelete="CASCADE"), nullable=True, index=True
 )
-notebook_id: Mapped[uuid.UUID] = mapped_column(ForeignKey("notebooks.id", ondelete="CASCADE"))
+notebook_id: Mapped[uuid.UUID] = mapped_column(ForeignKey("notebooks.id", ondelete="CASCADE"), index=True)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/api/app/agents/rag/retrieval.py` around lines 212 - 215, The query
filters use Chunk.notebook_id and Chunk.source_id which are not indexed, causing
potential full-table scans; update the Chunk model to declare indexes on these
columns (set index=True on the notebook_id and source_id Column definitions in
the Chunk model) and then create and apply a DB migration (e.g., Alembic) to add
the corresponding indexes to the database so the where clauses in functions
using Chunk.notebook_id and Chunk.source_id will use indexes.

apps/api/app/services/conversation_service.py (1)

175-177: 全局会话下建议跳过 notebook 摘要查询

Line 176 在 conv.notebook_id is None 时仍执行 get_notebook_summary，会产生一次无意义查询。建议按分支懒加载，和 stream_agent 保持一致。

建议修改

-        notebook_summary = await get_notebook_summary(conv.notebook_id, self.db)
+        notebook_summary = (
+            await get_notebook_summary(conv.notebook_id, self.db)
+            if conv.notebook_id
+            else None
+        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/api/app/services/conversation_service.py` around lines 175 - 177, 当前实现无论
conv.notebook_id 是否为 None 都会调用 get_notebook_summary，导致全局会话时产生多余查询；将对 notebook
摘要的调用按需懒加载：把 await get_notebook_summary(conv.notebook_id, self.db) 的调用移到 if
conv.notebook_id: 分支内部（与 stream_agent 的处理方式一致），仅在 conv.notebook_id 为真时才执行，并保留
user_memories 的获取不变；定位符参考
get_user_memories、get_notebook_summary、conv.notebook_id 和 stream_agent。

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/api/app/agents/rag/retrieval.py`:
- Around line 212-215: The query filters use Chunk.notebook_id and
Chunk.source_id which are not indexed, causing potential full-table scans;
update the Chunk model to declare indexes on these columns (set index=True on
the notebook_id and source_id Column definitions in the Chunk model) and then
create and apply a DB migration (e.g., Alembic) to add the corresponding indexes
to the database so the where clauses in functions using Chunk.notebook_id and
Chunk.source_id will use indexes.

In `@apps/api/app/domains/ai/routers/knowledge.py`:
- Around line 74-76: The list comprehension filtering raw chunks using "if
c.get('chunk_id') in meta" can silently drop valid chunks (e.g., due to
concurrent deletes); update the code in the same block where "raw" is filtered
(the comprehension that checks c.get('chunk_id')) to compute and log the number
(and optionally IDs) of chunks that were filtered out so we have observability
into dropped chunks from retrieve_chunks -> meta mismatches; include references
to "raw", "meta", and "chunk_id" in the log message so it's clear which items
were excluded.

In `@apps/api/app/services/conversation_service.py`:
- Around line 175-177: 当前实现无论 conv.notebook_id 是否为 None 都会调用
get_notebook_summary，导致全局会话时产生多余查询；将对 notebook 摘要的调用按需懒加载：把 await
get_notebook_summary(conv.notebook_id, self.db) 的调用移到 if conv.notebook_id:
分支内部（与 stream_agent 的处理方式一致），仅在 conv.notebook_id 为真时才执行，并保留 user_memories
的获取不变；定位符参考 get_user_memories、get_notebook_summary、conv.notebook_id 和
stream_agent。

In `@apps/api/tests/unit/test_retrieve_routers_auth.py`:
- Around line 11-15: The boolean comparison in requires_postgres is using "is
False" which is unpythonic; change the expression that determines
requires_postgres (currently: requires_postgres = os.environ.get("DATABASE_URL",
"sqlite").startswith("sqlite") is False) to use "not" for clarity and
correctness (e.g., set requires_postgres = not os.environ.get("DATABASE_URL",
"sqlite").startswith("sqlite")), leaving pytestmark and the DATABASE_URL usage
unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ee4cb79d-b1f0-4a40-ac33-8f9866f3ee05

📥 Commits

Reviewing files that changed from the base of the PR and between 4ea3a23 and 1cc0cd2.

📒 Files selected for processing (10)

apps/api/app/agents/rag/retrieval.py
apps/api/app/agents/writing/composer.py
apps/api/app/domains/ai/routers/knowledge.py
apps/api/app/domains/ai/routers/suggestions.py
apps/api/app/domains/ai/routers/writing.py
apps/api/app/services/conversation_service.py
apps/api/app/skills/builtin/summarize.py
apps/api/tests/integration/test_conversation_service.py
apps/api/tests/unit/test_compose_answer_extra_graph.py
apps/api/tests/unit/test_retrieve_routers_auth.py

LinMoQC added 2 commits March 27, 2026 12:02

vercel bot deployed to Preview March 27, 2026 04:12 View deployment

coderabbitai bot reviewed Mar 27, 2026

View reviewed changes

LinMoQC closed this pull request by merging all changes into main in ad58f55 Mar 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/api retrieve unification#38

Feat/api retrieve unification#38
2 commits merged intomainfrom
feat/api-retrieve-unification

LinMoQC commented Mar 27, 2026 •

edited

Loading

Uh oh!

vercel bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 27, 2026 •

edited

Loading

功能概览

变更概览

序列图

代码审查工作量估算

可能相关的PR

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LinMoQC commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of change

Related issue

Changes

How to test

Screenshots (if applicable)

Checklist

Uh oh!

vercel bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

功能概览

变更概览

序列图

代码审查工作量估算

可能相关的PR

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LinMoQC commented Mar 27, 2026 •

edited

Loading

vercel bot commented Mar 27, 2026 •

edited

Loading

coderabbitai bot commented Mar 27, 2026 •

edited

Loading