Skip to content

fix: fall back to pypdfium2 page count when docling-parse returns -1#3119

Closed
majiayu000 wants to merge 1 commit intodocling-project:mainfrom
majiayu000:fix/issue-3031-page-count-fallback
Closed

fix: fall back to pypdfium2 page count when docling-parse returns -1#3119
majiayu000 wants to merge 1 commit intodocling-project:mainfrom
majiayu000:fix/issue-3031-page-count-fallback

Conversation

@majiayu000
Copy link
Copy Markdown
Contributor

Summary

Fixes #3031

When docling-parse's C++ backend fails to parse a PDF's page tree (QPDF exception), number_of_pages() returns -1. The current page_count() method returns this value directly, causing is_valid() to reject documents that pypdfium2 can read fine.

Changes

  • Added fallback in page_count(): when docling-parse returns a negative count, fall back to pypdfium2's page count (len(self._pdoc)) with a warning log
  • Added test_page_count_fallback_on_parse_failure test that mocks docling-parse failure and verifies pypdfium2 fallback

Test plan

  • pre-commit run --all-files passes (Ruff + MyPy)
  • All 5 existing + new backend tests pass
  • New test verifies page_count() returns pypdfium2 count when docling-parse returns -1
  • New test verifies is_valid() returns True in fallback scenario

@github-actions
Copy link
Copy Markdown
Contributor

DCO Check Passed

Thanks @majiayu000, all your commits are properly signed off. 🎉

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 13, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@dolfim-ibm
Copy link
Copy Markdown
Member

Duplicate of #3040

@dolfim-ibm dolfim-ibm marked this as a duplicate of #3040 Mar 13, 2026
@majiayu000
Copy link
Copy Markdown
Contributor Author

Close for dup

@majiayu000 majiayu000 closed this Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DoclingParseDocumentBackend.page_count() returns -1 when docling-parse fails to parse PDF page tree

2 participants