fix: add depth limit to EventSerializer to prevent hangs on complex objects#1577
Open
BKDDFS wants to merge 2 commits intolangfuse:mainfrom
Open
fix: add depth limit to EventSerializer to prevent hangs on complex objects#1577BKDDFS wants to merge 2 commits intolangfuse:mainfrom
BKDDFS wants to merge 2 commits intolangfuse:mainfrom
Conversation
a895aed to
087052e
Compare
…bjects EventSerializer.default() recursively traverses __dict__ and __slots__ of arbitrary objects without a depth limit. When @observe() captures function arguments containing objects like google.genai.Client (which hold aiohttp sessions, connection pools, and threading locks), json.dumps blocks indefinitely on the second invocation. Add a _MAX_DEPTH=20 counter that returns a <TypeName> placeholder when exceeded, preventing infinite recursion into complex object graphs while preserving all existing serialization behavior.
087052e to
43114a6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
EventSerializer.default()recursively traverses__dict__and__slots__of arbitrary objects without a depth limit. When@observe()captures function arguments containing objects likegoogle.genai.Client(which hold aiohttp sessions, connection pools, and threading locks),json.dumpscan block indefinitely.Root cause
_serialize()increate_span_attributescallsjson.dumps(obj, cls=EventSerializer). The serializer's__dict__fallback recurses into every attribute of every object. For objects likeaiohttp.ClientSession→TCPConnector→ internal locks, this either hangs (lock contention / blocking attribute access) or enters extremely deep recursion.Fix
Add a
_MAX_DEPTH = 20depth counter toEventSerializer:default()increments/decrements_depthon each call_depth >= _MAX_DEPTH, returns<TypeName>placeholder instead of recursing_default_inner()— no behavior changesencode()resets_depthat the start of each serializationWorkaround (current)
@observe(capture_input=False)— disables automatic input capture entirely.Tests
3 new tests:
test_deeply_nested_object_does_not_hang— objects with connection pools/lockstest_max_depth_returns_type_name—__dict__deep nesting truncationtest_deeply_nested_slots_object_is_truncated—__slots__deep nesting truncationAll 21 tests pass (18 existing + 3 new).
Disclaimer: Experimental PR review
Greptile Summary
This PR fixes a real and reproducible hang in
EventSerializerby introducing a_MAX_DEPTH = 20counter that short-circuits recursion when serializing deeply nested objects (e.g.google.genai.Clientholding aiohttp session internals). The core mechanism — incrementing_depthindefault()and decrementing it in afinallyblock — is correct and well-structured.\n\nKey changes:\n-default()is now a thin depth-guard that delegates to_default_inner(), preserving all existing serialization behavior unchanged.\n-encode()resets_depth = 0before each top-level call, which is a good defensive measure.\n- Three new tests cover the main scenarios: hang prevention,__dict__depth truncation, and__slots__depth truncation.\n\nIssues found:\n- The__slots__branch at line 153–156 callsself.default({...})for the intermediate dict it constructs. Becausedefault()also increments_depth, each__slots__object consumes 2 depth units per level instead of 1, cutting the effective limit from 20 to ~10 for__slots__-based objects. Changing that single call to_default_inner({...})would make behavior consistent.\n- Thetest_deeply_nested_slots_object_is_truncatedassertion bound (_MAX_DEPTH + 5 = 25) is too loose to catch regressions in truncation depth for the slots case.Confidence Score: 4/5
Safe to merge — the hang is fixed and no existing behavior is regressed; the remaining issues are a minor depth inconsistency for
__slots__objects and a loose test assertion.The fix correctly solves the stated problem (infinite recursion / hang on complex objects) and all 21 tests pass. The only issues are a double-depth-counting side-effect in the
__slots__path (effective limit ~10 instead of 20) and an overly permissive test assertion — neither affects the primary fix or causes correctness failures in normal usage.langfuse/_utils/serializer.pylines 153–156 (__slots__handling) and the corresponding test assertion attests/test_serializer.py:267.Important Files Changed
_MAX_DEPTH = 20depth limiter to prevent infinite recursion/hangs; logic is correct for__dict__-based objects but__slots__objects incur double depth consumption (consuming 2 units per level instead of 1), halving their effective limit to ~10.__dict__truncation, and__slots__truncation; the slots test has an overly loose assertion (_MAX_DEPTH + 5) that doesn't tightly verify the truncation depth.Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A["encode(obj)"] --> B["_depth = 0\nseen.clear()"] B --> C["default(obj)"] C --> D{{"_depth >= _MAX_DEPTH?"}} D -- Yes --> E["return '<TypeName>'"] D -- No --> F["_depth += 1\n_default_inner(obj)"] F --> G{{"Type of obj?"}} G -- "datetime / UUID / bytes / etc." --> H["Return primitive"] G -- "dict" --> I["self.default(k)\nself.default(v)\nfor each item"] G -- "list / Sequence" --> J["self.default(item)\nfor each item"] G -- "__slots__" --> K["self.default(dict of slots)\n⚠️ extra depth unit"] G -- "__dict__" --> L{{"id(obj) in seen?"}} L -- Yes --> M["return TypeName (circular ref)"] L -- No --> N["seen.add(id)\nself.default(v)\nfor each attr\nseen.remove(id)"] G -- "other" --> O["return '<TypeName>'"] I --> P["_depth -= 1\n(finally)"] J --> P K --> P N --> P H --> P M --> P O --> P P --> Q["return result to encode()"] Q --> R["super().encode(result)"]Comments Outside Diff (1)
langfuse/_utils/serializer.py, line 153-156 (link)__slots__path consumes double the depth budget per object levelBecause the
__slots__branch callsself.default({...})for the intermediate dict, each__slots__object consumes 2 depth units (one for the object itself, one for the constructed dict), whereas a__dict__-based object consumes only 1. In practice this halves the effective depth limit for__slots__objects: they truncate at approximately 10 levels of nesting rather than the intended 20.Tracing for a chain of
SlotLevelobjects:default(SlotLevel_N)→_depth= 2*(N-1)default({"child": SlotLevel_{N+1}})→_depth= 2*(N-1)+1A lightweight fix is to call
_default_innerdirectly for the constructed dict rather than routing it back throughdefault():This avoids the redundant depth check and gives
__slots__objects the same effective depth limit as__dict__objects.Reviews (1): Last reviewed commit: "fix: add depth limit to EventSerializer ..." | Re-trigger Greptile
(2/5) Greptile learns from your feedback when you react with thumbs up/down!