fix: resolve dict attribute error in MarkdownGenerator deserialization by MTahirKleem · Pull Request #1887 · unclecode/crawl4ai

MTahirKleem · 2026-03-30T20:26:41Z

This PR addresses an AttributeError that occurred when CrawlResult was initialized from a dictionary (e.g., from cache or JSON), where the markdown field remained a raw dict instead of being hydrated into a MarkdownGenerationResult object.

List of files changed and why
crawl4ai/models.py: Updated CrawlResult with a Pydantic alias and validator to ensure proper object hydration while maintaining the backward-compatible string-like interface.

tests/unit/test_crawl_result_markdown_validator.py: New test file to verify that passing a dictionary to the markdown parameter correctly results in a MarkdownGenerationResult object.

How Has This Been Tested?
Manual Verification: Created a reproduction script that initialized CrawlResult with a raw dictionary and verified that property access (e.g., .raw_markdown) worked without error.

Unit Testing: Ran the newly created test case to confirm the validator intercepts and converts dictionary inputs correctly.

Serialization Test: Verified that model_dump() still correctly exports the markdown field for storage.

Checklist:
[x] My code follows the style guidelines of this project

[x] I have performed a self-review of my own code

[x] I have commented my code, particularly in hard-to-understand areas

[x] I have made corresponding changes to the documentation

[x] I have added/updated unit tests that prove my fix is effective or that my feature works

[x] New and existing unit tests pass locally with my changes

Copilot

Pull request overview

This PR aims to make CrawlResult resilient to cached/JSON hydration cases where the markdown payload is a raw dict, preventing attribute errors during markdown access/serialization.

Changes:

Add a markdown_data field (aliased from "markdown") to improve Pydantic hydration robustness.
Add a field_validator to convert raw dict inputs into MarkdownGenerationResult.
Add a unit test validating dict-to-model conversion and string-compatible access behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`crawl4ai/models.py`	Introduces `markdown_data` alias + validator + post-init syncing to prevent markdown deserialization issues.
`tests/unit/test_crawl_result_markdown_validator.py`	Adds a unit test ensuring dict markdown input is converted and serialized correctly.

Comments suppressed due to low confidence (1)

crawl4ai/models.py:185

The NOTE block at lines 181–191 appears dedented to column 0 within CrawlResult. If those lines are truly unindented, that would end the class body and make the subsequent indented def model_post_init(...) an IndentationError (module won’t import). Keep the NOTE block indented at the same level as other class members (or move it fully outside the class along with the following methods).

# NOTE: The StringCompatibleMarkdown class, model_post_init hook, property getters/setters,
# and model_dump override all exist to support a smooth transition from markdown as a string
# to markdown as a MarkdownGenerationResult object, while maintaining backward compatibility.
# 
# This allows code that expects markdown to be a string to continue working, while also

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T20:31:24Z

crawl4ai/models.py

+        if isinstance(v, dict):
+            # This converts a raw dictionary (from cache/JSON)
+            # into the structured Pydantic object
+            return MarkdownGenerationResult(**v)


validate_markdown only converts dicts. If input markdown is a plain string (which can happen with legacy cache hydration), Pydantic will try to validate that string as MarkdownGenerationResult for markdown_data and raise a validation error. Consider handling str here (e.g., wrap it into a MarkdownGenerationResult with raw_markdown set and other fields empty) to preserve backward compatibility.

Suggested change

return MarkdownGenerationResult(**v)

return MarkdownGenerationResult(**v)

if isinstance(v, str):

# Legacy backward compatibility: markdown provided as a plain string

# Wrap it into MarkdownGenerationResult, using raw_markdown

return MarkdownGenerationResult(raw_markdown=v)

Copilot · 2026-03-30T20:31:25Z

tests/unit/test_crawl_result_markdown_validator.py

+def test_crawl_result_converts_markdown_dict_input():
+    result = CrawlResult(
+        url="https://example.com",
+        html="<html></html>",
+        success=True,
+        markdown={
+            "raw_markdown": "# Hello",
+            "markdown_with_citations": "# Hello",
+            "references_markdown": "",
+            "fit_markdown": "Hello",
+            "fit_html": "<p>Hello</p>",
+        },
+    )


This test covers dict input, but it doesn’t cover legacy string hydration (e.g., CrawlResult(..., markdown="# Hello")). Adding that case would protect backward compatibility and catch the validation regression mentioned in the model validator.

Copilot · 2026-03-30T20:31:25Z

crawl4ai/models.py

    model_config = ConfigDict(arbitrary_types_allowed=True)

-# NOTE: The StringCompatibleMarkdown class, custom __init__ method, property getters/setters,
+    @field_validator("markdown_data", mode="before")


The PR metadata links/mentions Issue #123 (“Timeout setting”), but the changes here are focused on CrawlResult markdown deserialization. If this PR is intended to fix #123, the code changes don’t appear to address timeouts; otherwise consider removing/clarifying the issue reference in the PR description.

- Added 'markdown_data' alias field to CrawlResult for robust Pydantic hydration. - Implemented field_validator to convert raw dicts to MarkdownGenerationResult. - Used model_post_init to sync internal private state. - Maintained backward compatibility for string-based markdown access. - Added unit test for validation guard.

Copilot AI review requested due to automatic review settings March 30, 2026 20:26

Copilot started reviewing on behalf of MTahirKleem March 30, 2026 20:27 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

MTahirKleem force-pushed the fix/issue-1880-markdown-deserialization branch from f7a699f to 7344c05 Compare March 30, 2026 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: resolve dict attribute error in MarkdownGenerator deserialization#1887

fix: resolve dict attribute error in MarkdownGenerator deserialization#1887
MTahirKleem wants to merge 1 commit intounclecode:mainfrom
MTahirKleem:fix/issue-1880-markdown-deserialization

MTahirKleem commented Mar 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-            return MarkdownGenerationResult(**v)
+            return MarkdownGenerationResult(**v)
+        if isinstance(v, str):
+            # Legacy backward compatibility: markdown provided as a plain string
+            # Wrap it into MarkdownGenerationResult, using raw_markdown
+            return MarkdownGenerationResult(raw_markdown=v)

Uh oh!

Conversation

MTahirKleem commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MTahirKleem commented Mar 30, 2026 •

edited

Loading