Skip to content

refactor(tools): split ReadPdf and ReadHtml out of Read tool#155

Merged
yishuiliunian merged 3 commits into
mainfrom
worktree-clever-chasing-mitten
May 15, 2026
Merged

refactor(tools): split ReadPdf and ReadHtml out of Read tool#155
yishuiliunian merged 3 commits into
mainfrom
worktree-clever-chasing-mitten

Conversation

@yishuiliunian
Copy link
Copy Markdown
Contributor

Summary

  • Extract PDF reading (ReadPdfTool) and HTML reading (ReadHtmlTool) into independent tool crates
  • Read tool now only handles text files with a clean schema (no pages parameter)
  • Fixes a bug where LLM agents passed empty pages: "" on non-PDF files, causing repeated failures

Changes

  • New: crates/tools/filesystem/read-pdf/ReadPdfTool with pages param
  • New: crates/tools/filesystem/read-html/ReadHtmlTool for HTML→text conversion
  • Modified: crates/tools/filesystem/read/ — removed PDF/HTML branches and pages schema field
  • Modified: crates/tools/registry/ — registers both new tools

Test plan

  • All 4 affected test targets pass locally
  • Clippy passes with zero warnings
  • CI passes

The Read tool had PDF extraction and HTML conversion baked in,
violating SRP and polluting its schema with a `pages` parameter
that caused LLM agents to pass empty strings on non-PDF files,
triggering repeated failures.

- New `loopal-tool-read-pdf` crate: ReadPdf tool with `pages` param
- New `loopal-tool-read-html` crate: ReadHtml tool for HTML→text
- Read tool now only handles text files (clean schema, no `pages`)
@yishuiliunian yishuiliunian merged commit 05c8cbd into main May 15, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant