Skip to content

fix(python-driver): add null-guards in ANTLR parser and relax runtime version pin#2372

Open
uesleilima wants to merge 3 commits intoapache:masterfrom
uesleilima:fix/python-driver-antlr-parser-robustness
Open

fix(python-driver): add null-guards in ANTLR parser and relax runtime version pin#2372
uesleilima wants to merge 3 commits intoapache:masterfrom
uesleilima:fix/python-driver-antlr-parser-robustness

Conversation

@uesleilima
Copy link
Copy Markdown

Summary

Fix two related issues in the Python driver's ANTLR4 parsing pipeline:

1. ANTLR4 agtype parser crashes on vertices with complex properties (#2367)

The ResultVisitor in builder.py crashes with AttributeError: 'NoneType' object has no attribute 'stop' when deserializing certain vertex records whose properties contain large arrays, long text fields with special characters, or deeply nested structures.

Root cause: Several visitor methods access child nodes (.symbol, .getText(), .accept()) without null-checking first. When the ANTLR4 parser tree contains None nodes, these calls crash.

Fix: Add null-guards in visitAgValue, visitFloatLiteral, visitPair, visitObj, and handleAnnotatedValue. Also replaces shadowing of builtin dict with d in handleAnnotatedValue, and uses .get() for safer key access on parsed vertex/edge dicts.

2. antlr4-python3-runtime==4.11.1 is incompatible with Python >= 3.13 (#2368)

The exact version pin ==4.11.1 causes runtime errors on Python 3.13+ due to deprecated/removed CPython internals that ANTLR 4.11.x relies on.

Fix: Relax the constraint from ==4.11.1 to >=4.11.1,<5.0 in both pyproject.toml and requirements.txt. The ANTLR ATN serialized format is unchanged between 4.11 and 4.13, so the generated lexer/parser files are compatible. Validated with antlr4-python3-runtime==4.13.2 on Python 3.11-3.14.

Tests

Added 9 new test cases in test_agtypes.py:

  • Vertices with large array properties (12 elements)
  • Vertices with special characters in properties
  • Deeply nested property structures (3 levels)
  • Empty properties
  • Null property values
  • Edges with complex properties (arrays, booleans)
  • Paths with multiple edges
  • Empty/null input handling
  • Arrays of mixed types including nested arrays

All 12 tests pass with antlr4-python3-runtime==4.13.2.

Closes

… version pin

Fix two related issues in the Python driver's ANTLR4 parsing pipeline:

1. Add null-guards in ResultVisitor methods (visitAgValue, visitFloatLiteral,
   visitPair, visitObj, handleAnnotatedValue) to prevent AttributeError crashes
   when the ANTLR4 parse tree contains None child nodes. This occurs with
   vertices that have complex properties (large arrays, special characters,
   deeply nested structures). (apache#2367)

2. Relax antlr4-python3-runtime version constraint from ==4.11.1 to
   >=4.11.1,<5.0 in both pyproject.toml and requirements.txt. The 4.11.1
   pin is incompatible with Python >= 3.13. The ANTLR ATN serialized format
   is unchanged between 4.11 and 4.13, so the generated lexer/parser files
   are compatible. Validated with antlr4-python3-runtime==4.13.2 on
   Python 3.11-3.14. (apache#2368)

Also replaces shadowing of builtin 'dict' in handleAnnotatedValue with 'd',
and uses .get() for safer key access on parsed vertex/edge dicts.

Closes apache#2367
Closes apache#2368
Verify that malformed and truncated agtype strings raise AGTypeError
(or recover gracefully) rather than crashing with AttributeError.
This tests the null-guards added to the ANTLR parser visitor.

Made-with: Cursor
Copy link
Copy Markdown

@tcolo tcolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uesleilima thanks alot of the patch - can you please address the issues in the review from claude code?
## Code Review: `fix(python-driver)`: add null-guards in ANTLR parser and relax runtime version pin

### Summary
This PR fixes two real bugs: a crash on `AttributeError` when deserializing complex vertex properties (#2367), and a version pin that blocks Python 3.13+ (#2368). The approach is sound and the test coverage is meaningful. A few things worth discussing before merging.

---

### 🔴 Issues to Address

**1. `visitFloatLiteral` — silent fallback may mask malformed input**

When `ctx.getChild(0)` returns `None`, the node is genuinely malformed. Silently returning a default value hides the problem — a test confirms no `AttributeError`, but an incorrect result is arguably worse. Consider raising `AGTypeError` consistently with how `visitPair` handles missing nodes:

```python
# Suggested
child = ctx.getChild(0)
if child is None:
    raise AGTypeError("Malformed float literal: missing child node")

2. handleAnnotatedValue — type validation failure mid-path construction

When a type check fails mid-path (e.g., vertex dict check passes, edge dict check raises AGTypeError), the partially constructed Path object may already have been mutated. It's discarded on exception so it's minor, but a comment explaining the intended behavior would help.


🟡 Suggestions

1. Add a comment explaining the version constraint

The change from ==4.11.1 to >=4.11.1,<5.0 is the right call — ANTLR4 maintains format compatibility within a major version. A brief inline comment in pyproject.toml will help the next person who wonders why it's not pinned more tightly:

# ANTLR4 runtime is format-compatible within major versions; tested on 4.13.2 with Python 3.11–3.14
"antlr4-python3-runtime>=4.11.1,<5.0"

2. Malformed input tests should assert the expected outcome, not just the absence of AttributeError

# Current pattern proves the old crash is gone, but doesn't pin the new contract
try:
    result = handler.parse(malformed_input)
except AttributeError:
    pytest.fail("Should not raise AttributeError")

# Better: assert what *should* happen
with pytest.raises(AGTypeError):
    handler.parse(malformed_input)

# OR if graceful None is expected:
result = handler.parse(malformed_input)
assert result is None

3. visitObj silently skips None values — visitPair hard-fails on them

This inconsistency could cause silent data loss (a property disappearing) instead of a parse error. Either align the behavior or add a comment explaining the intentional difference.


✅ What Looks Good

  • The visitAgValue null-checks are clean and correctly handle the annotated-type path.
  • Replacing dict["key"] with .get("key") in handleAnnotatedValue is the right call for externally-provided data.
  • Fixing the shadowed dict builtin is a good catch — subtle but real.
  • The nine new test cases cover solid breadth of the real-world inputs that triggered the original bug.
  • Both pyproject.toml and requirements.txt are kept in sync on the version change — these often drift.

Verdict: Request Changes

The core fix is correct and needed. Two points before merging:

  1. Clarify/fix visitFloatLiteral — silent 0.0 default vs. raising AGTypeError.
  2. Tighten the malformed-input tests to assert the expected outcome, not just the absence of AttributeError.

The version pin relaxation is ready to merge as-is.


Paste this into **Files changed → Review changes → Leave a comment** on the PR and select **Request changes**.

- visitFloatLiteral: raise AGTypeError on malformed child node instead
  of silently returning a fallback value
- visitObj: add comment documenting that visitPair's validation makes
  the None-guard defensive-only
- handleAnnotatedValue: add comment explaining partial-construction
  behavior on type-check failure
- pyproject.toml: add comment explaining ANTLR4 version range rationale
- Tests: assert AGTypeError (or graceful recovery) for malformed and
  truncated inputs, not just absence of AttributeError

Made-with: Cursor
@uesleilima uesleilima requested a review from tcolo April 5, 2026 23:03
@MuhammadTahaNaveed MuhammadTahaNaveed requested review from Copilot and removed request for tcolo April 6, 2026 10:31
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses crashes in the Python driver’s ANTLR4 agtype parsing by adding null-guards/error handling in the visitor and updates packaging to allow newer antlr4-python3-runtime versions (needed for Python 3.13+ compatibility).

Changes:

  • Added null-guards and additional validation/error reporting in ResultVisitor / handleAnnotatedValue during agtype deserialization.
  • Relaxed antlr4-python3-runtime dependency pin to >=4.11.1,<5.0 in packaging files.
  • Expanded agtype parsing tests to cover complex vertices/edges/paths and malformed/truncated inputs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
drivers/python/age/builder.py Adds null-guards and changes annotated-type object construction logic.
drivers/python/test_agtypes.py Adds multiple new parsing tests for complex and malformed agtype inputs.
drivers/python/requirements.txt Relaxes ANTLR runtime constraint for broader Python compatibility.
drivers/python/pyproject.toml Relaxes ANTLR runtime constraint and documents supported/tested ranges.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +206 to 220
d = ctx.accept(self)
if not isinstance(d, dict):
raise AGTypeError(str(ctx.getText()), "Expected dict for vertex, got " + type(d).__name__)
vid = d.get("id")
vertex = None
if self.vertexCache != None and vid in self.vertexCache :
if self.vertexCache is not None and vid in self.vertexCache:
vertex = self.vertexCache[vid]
else:
vertex = Vertex()
vertex.id = dict["id"]
vertex.label = dict["label"]
vertex.properties = dict["properties"]
vertex.id = d.get("id")
vertex.label = d.get("label")
vertex.properties = d.get("properties")

if self.vertexCache != None:
if self.vertexCache is not None:
self.vertexCache[vid] = vertex
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In handleAnnotatedValue() (vertex branch), switching from required key access to d.get(...) can silently construct a Vertex with id=None and/or properties=None when the parsed dict is missing keys (e.g., after parser error recovery). This can also poison vertexCache by storing under the None key and later cause runtime errors (e.g., Vertex.__getitem__ assumes properties is a dict). Consider validating required fields (id, label, properties) and either raising AGTypeError when they’re missing/invalid or defaulting properties to {} before constructing/caching the vertex.

Copilot uses AI. Check for mistakes.
Comment on lines 224 to +233
elif anno == "edge":
edge = Edge()
dict = ctx.accept(self)
edge.id = dict["id"]
edge.label = dict["label"]
edge.end_id = dict["end_id"]
edge.start_id = dict["start_id"]
edge.properties = dict["properties"]
d = ctx.accept(self)
if not isinstance(d, dict):
raise AGTypeError(str(ctx.getText()), "Expected dict for edge, got " + type(d).__name__)
edge.id = d.get("id")
edge.label = d.get("label")
edge.end_id = d.get("end_id")
edge.start_id = d.get("start_id")
edge.properties = d.get("properties")
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly for the edge branch: using d.get(...) can create an Edge with missing identifiers or properties=None, which can later crash when the edge is accessed as a mapping (since Edge.__getitem__ assumes properties is a dict). Recommend enforcing required keys (id, label, start_id, end_id, properties) and/or defaulting properties to {}; if the parsed value is incomplete due to recovery, raising AGTypeError is safer than returning a partially-initialized model object.

Copilot uses AI. Check for mistakes.
Comment on lines +250 to +251
# value (None, dict, Vertex, etc.) — not a crash.
self.assertNotIsInstance(result, type(NotImplemented))
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion doesn’t meaningfully validate “recovered” output: assertNotIsInstance(result, type(NotImplemented)) will pass for essentially any normal parse result (including None). To make the test effective, assert an expected set of types/shape (e.g., None or a dict/list/model object) or assert specific properties when recovery occurs, while still ensuring no AttributeError is raised.

Suggested change
# value (None, dict, Vertex, etc.) — not a crash.
self.assertNotIsInstance(result, type(NotImplemented))
# parsed value with an expected shape (None, container, or
# model-like object) — not just any arbitrary object.
self.assertTrue(
result is None
or isinstance(result, (dict, list, tuple))
or hasattr(result, "__dict__")
or hasattr(result, "__getitem__"),
f"Malformed input recovered to unexpected result type "
f"{type(result).__name__}: {inp}"
)

Copilot uses AI. Check for mistakes.
Comment on lines +260 to +274
def test_truncated_agtype_raises_agtypeerror(self):
"""Issue #2367: Truncated agtype must raise AGTypeError, never AttributeError."""
from age.exceptions import AGTypeError

truncated_inputs = [
'{"id": 1, "label": "X", "properties": {"name": "te',
'{"id": 1, "label": "X"',
'[{"id": 1}::vertex, {"id": 2',
]
for inp in truncated_inputs:
try:
result = self.parse(inp)
# Recovery is acceptable for truncated input
self.assertIsNotNone(result)
except AGTypeError:
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says truncated agtype “must raise AGTypeError”, but the test body explicitly allows successful recovery (assertIsNotNone(result)). Update the docstring (or the assertions) so the test’s contract is unambiguous.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants