Skip to content
Merged

9.1.0 #595

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
5d9c952
Add wildcard/glob pattern support for exclude_paths and include_paths
akshat62 Mar 28, 2026
c82a5ef
Add missing files to sdist
mgorny Mar 31, 2026
6adc8cf
Remove obsolete `MANIFEST.in` file
mgorny Mar 31, 2026
4199009
Add wildcard/glob pattern support for exclude_paths and include_paths
akshat62 Mar 31, 2026
f2f7038
Resolve
akshat62 Mar 31, 2026
a349699
1. Nested namedtuple set/frozenset updates could replace the whole re…
seperman Apr 27, 2026
b697d65
Changed deepdiff/delta.py:237 so dunder traversal from check_elem() r…
seperman Apr 27, 2026
8a607be
Implemented the cache replacement.
seperman Apr 27, 2026
e352ed8
- deephash.py: corrected exclude_paths/include_paths type to SetOrdered
seperman Apr 27, 2026
fbb1adb
fixing the failing tests
seperman Apr 27, 2026
c790154
Phase 1 is in. Summary of what landed:
seperman Apr 27, 2026
0450c8d
Phase 2 implementation is complete — all subticket #2 acceptance crit…
seperman Apr 27, 2026
794aa9d
- deepdiff/_multiprocessing.py: _subtree_diff_worker + compute_subtre…
seperman Apr 27, 2026
dd2c678
- REPEATS 10 → 2 (collection is index-keyed, so completion order can'…
seperman Apr 27, 2026
061e11b
Code (deepdiff/_multiprocessing.py)
seperman Apr 27, 2026
e829c61
Phase 5 — Subticket #6 (extended determinism matrix)
seperman Apr 28, 2026
d4e6342
changing the link to survey
seperman May 4, 2026
4403424
fixed the broken test
seperman May 15, 2026
f84fee5
Merge pull request #592 from mgorny/fix-sdist
seperman May 15, 2026
5d75805
updating dependencies
seperman May 15, 2026
d0ca084
Merge branch 'dev' into feature/glob-wildcard-paths
seperman May 15, 2026
8c3873a
updating github actions
seperman May 15, 2026
83e9e61
ignore non-deterministic worker stats in test_cache_deeply_nested_a2
seperman May 15, 2026
277c89c
Merge pull request #590 from akshat62/feature/glob-wildcard-paths
seperman May 15, 2026
f40f163
memoize GlobPathMatcher to remove exponential cliff
seperman May 15, 2026
1fc129f
updating docs
seperman May 15, 2026
f4e58b5
updating authors
seperman May 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ jobs:
architecture: ['x64']

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v5

- name: Setup Python
uses: actions/setup-python@v4
uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python-version }}
architecture: ${{ matrix.architecture }}
Expand Down Expand Up @@ -50,7 +50,7 @@ jobs:

- name: Upload coverage
if: ${{ matrix.python-version == '3.14' }}
uses: codecov/codecov-action@v4
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
file: coverage.xml
Expand Down
2 changes: 2 additions & 0 deletions AUTHORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,3 +86,5 @@ Authors in order of the timeline of their contributions:
- [srini047](https://github.com/srini047) for fixing README typo.
- [Nagato-Yuzuru](https://github.com/Nagato-Yuzuru) for colored view tests.
- [akshat62](https://github.com/akshat62) for adding Fraction numeric support.
- [akshat62](https://github.com/akshat62) for adding wildcard/glob pattern support for `exclude_paths` and `include_paths`.
- [mgorny](https://github.com/mgorny) for adding missing files to sdist and removing obsolete `MANIFEST.in`.
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,19 @@
# DeepDiff Change log

- v9-1-0
- Added multiprocessing support for DeepDiff: parallel distance computation and parallel subtree diffing with aggregated worker stats, deterministic ordering, and automatic fallback to serial when unsafe (e.g. `custom_operators`, `*_obj_callback`, `ignore_order_func`)
- Added wildcard/glob pattern support for `exclude_paths` and `include_paths` thanks to [akshat62](https://github.com/akshat62)
- Reimplemented internal cache for improved performance
- Memoized `GlobPathMatcher` to remove exponential-time matching cliff
- Comprehensive type-hint corrections across `deephash.py`, `helper.py`, `delta.py`, `diff.py`, `distance.py`, `path.py`, and `serialization.py` (also fixed real bugs: misplaced paren in `path._guess_type` call, and `len(other.indexes > 1)` → `len(other.indexes) > 1` in `diff._compare_in_order`)
- Security: Delta dunder-attribute traversal in `check_elem()` now raises immediately instead of going through `_raise_or_log()`, with full-path preflight validation in `_get_elements_and_details()` so the `set_item_added` path cannot silently skip malicious dunder paths
- Fixed nested NamedTuple set/frozenset Delta updates dropping the outer container
- Fixed tuple Deltas using iterable opcodes silently doing nothing for insert/delete-only changes
- Fixed Delta with both moved and added iterable items mutating the Delta's own internal diff data
- Fixed crash during path sorting when removing multiple dictionary items with complex keys
- Packaging: added missing files to sdist and removed obsolete `MANIFEST.in` thanks to [mgorny](https://github.com/mgorny)
- Updated GitHub Actions workflows and dependencies

- v9-0-0
- migration note:
- `to_dict()` and `to_json()` now accept a `verbose_level` parameter and always return a usable text-view dict. When the original view is `'tree'`, they default to `verbose_level=2` for full detail. The old `view_override` parameter is removed. To get the previous results, you will need to pass the explicit verbose_level to `to_json` and `to_dict` if you are using the tree view.
Expand Down
20 changes: 0 additions & 20 deletions MANIFEST.in

This file was deleted.

36 changes: 16 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# DeepDiff v 9.0.0
# DeepDiff v 9.1.0

![Downloads](https://img.shields.io/pypi/dm/deepdiff.svg?style=flat)
![Python Versions](https://img.shields.io/pypi/pyversions/deepdiff.svg?style=flat)
Expand All @@ -21,29 +21,25 @@

Tested on Python 3.10+ and PyPy3.

- **[Documentation](https://zepworks.com/deepdiff/9.0.0/)**
- **[Documentation](https://zepworks.com/deepdiff/9.1.0/)**

## What is new?

Please check the [ChangeLog](CHANGELOG.md) file for the detailed information.

DeepDiff 9-0-0
- migration note:
- `to_dict()` and `to_json()` now accept a `verbose_level` parameter and always return a usable text-view dict. When the original view is `'tree'`, they default to `verbose_level=2` for full detail. The old `view_override` parameter is removed. To get the previous results, you will need to pass the explicit verbose_level to `to_json` and `to_dict` if you are using the tree view.
- Dropping support for Python 3.9
- Support for python 3.14
- Added support for callable `group_by` thanks to @echan5
- Added `FlatDeltaDict` TypedDict for `to_flat_dicts` return type
- Fixed colored view display when all list items are removed thanks to @yannrouillard
- Fixed `hasattr()` swallowing `AttributeError` in `__slots__` handling for objects with `__getattr__` thanks to @tpvasconcelos
- Fixed `ignore_order=True` missing int-vs-float type changes
- Fixed Delta producing phantom entries when items both move and change values with `iterable_compare_func` thanks to @devin13cox
- Fixed `_convert_oversized_ints` failing on NamedTuples
- Fixed orjson `TypeError` for integers exceeding 64-bit range
- Fixed parameter bug in `to_flat_dicts` where `include_action_in_path` and `report_type_changes` were not being passed through
- Fixed `ignore_keys` issue in `detailed__dict__` thanks to @vitalis89
- Fixed logarithmic similarity type hint thanks to @ljames8
- Added `Fraction` numeric support thanks to @akshat62
DeepDiff 9-1-0
- Added multiprocessing support for DeepDiff: parallel distance computation and parallel subtree diffing with aggregated worker stats, deterministic ordering, and automatic fallback to serial when unsafe (e.g. `custom_operators`, `*_obj_callback`, `ignore_order_func`)
- Added wildcard/glob pattern support for `exclude_paths` and `include_paths` thanks to @akshat62
- Reimplemented internal cache for improved performance
- Memoized `GlobPathMatcher` to remove exponential-time matching cliff
- Comprehensive type-hint corrections across `deephash.py`, `helper.py`, `delta.py`, `diff.py`, `distance.py`, `path.py`, and `serialization.py` (also fixed real bugs: misplaced paren in `path._guess_type` call, and `len(other.indexes > 1)` → `len(other.indexes) > 1` in `diff._compare_in_order`)
- Security: Delta dunder-attribute traversal in `check_elem()` now raises immediately instead of going through `_raise_or_log()`, with full-path preflight validation in `_get_elements_and_details()` so the `set_item_added` path cannot silently skip malicious dunder paths
- Fixed nested NamedTuple set/frozenset Delta updates dropping the outer container
- Fixed tuple Deltas using iterable opcodes silently doing nothing for insert/delete-only changes
- Fixed Delta with both moved and added iterable items mutating the Delta's own internal diff data
- Fixed crash during path sorting when removing multiple dictionary items with complex keys
- Packaging: added missing files to sdist and removed obsolete `MANIFEST.in` thanks to @mgorny
- Updated GitHub Actions workflows and dependencies

## Installation

Expand Down Expand Up @@ -77,7 +73,7 @@ Please take a look at the [CHANGELOG](CHANGELOG.md) file.

# Survey

:mega: **Please fill out our [fast 5-question survey](https://forms.gle/E6qXexcgjoKnSzjB8)** so that we can learn how & why you use DeepDiff, and what improvements we should make. Thank you! :dancers:
:mega: **Please fill out our [fast 10-question survey](https://tally.so/r/J98MPY)** so that we can learn how & why you use DeepDiff, and what improvements we should make. Thank you! :dancers:

# Local dev

Expand Down
Empty file added benchmarks/__init__.py
Empty file.
245 changes: 245 additions & 0 deletions benchmarks/multiprocessing_bench.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
"""Benchmarks for the internal multiprocessing mode (Subticket #7).

Goal: provide a reproducible "is multiprocessing actually faster?" check for
the workloads multi_processing.md flags as the primary targets — the
``ignore_order=True`` distance loop, paired-subtree diffs, and large lists of
nested dicts. Each workload runs serial first, then parallel at a few worker
counts; we print a single results table.

Usage::

source ~/.venvs/deep/bin/activate
python -m benchmarks.multiprocessing_bench

# Smaller, faster sweep:
python -m benchmarks.multiprocessing_bench --quick

# Just one workload:
python -m benchmarks.multiprocessing_bench --only paired_subtree

The script also asserts that the parallel result equals the serial result for
every workload — a benchmark that produces wrong answers is worse than no
benchmark at all. If any pair diverges the script exits non-zero.

The numbers here are not committed; they're meant to inform threshold tuning
(see DEFAULT_THRESHOLD in deepdiff/_multiprocessing.py) and to expose
regressions when the hot path changes. Re-run on your hardware before drawing
conclusions — process spawn overhead and IPC pickle cost vary wildly across
machines.
"""

import argparse
import os
import sys
import time
from typing import Any, Callable, Dict, List, Tuple

# Make the package importable when the script is run from a checkout.
HERE = os.path.dirname(os.path.abspath(__file__))
ROOT = os.path.dirname(HERE)
if ROOT not in sys.path:
sys.path.insert(0, ROOT)

from deepdiff import DeepDiff # noqa: E402


# ---------------------------------------------------------------------------
# Workloads.
#
# Each builder returns ``(t1, t2, kwargs)`` where ``kwargs`` is the DeepDiff
# constructor arguments common to both the serial and parallel runs.
# Multiprocessing parameters are added by the runner; workloads should not set
# them.
# ---------------------------------------------------------------------------


def workload_paired_subtree(scale: int) -> Tuple[Any, Any, Dict[str, Any]]:
"""Heavy paired-subtree diff path.

Each item is a small dict whose nested ``data`` differs by one element;
pairing kicks in for every item, so the subtree-parallel path runs.
"""
n = scale
t1 = [{"id": i, "data": {"x": i, "y": [i, i + 1, i + 2]}} for i in range(n)]
t2 = [{"id": i, "data": {"x": i, "y": [i, i + 1, i + 3]}} for i in range(n)]
return t1, t2, {"ignore_order": True, "cutoff_intersection_for_pairs": 1}


def workload_distance_loop(scale: int) -> Tuple[Any, Any, Dict[str, Any]]:
"""Heavy added-vs-removed distance grid.

All ids are disjoint between t1 and t2, so every t2 item is "added" and
every t1 item is "removed". The candidate distance grid is N*N, which is
where the distance worker pool earns its keep.
"""
n = scale
t1 = [{"id": i, "v": [i, i, i]} for i in range(n)]
t2 = [{"id": i + 10_000, "v": [i, i, i + 1]} for i in range(n)]
return t1, t2, {"ignore_order": True, "cutoff_intersection_for_pairs": 1}


def workload_large_nested_dicts(scale: int) -> Tuple[Any, Any, Dict[str, Any]]:
"""Large list of moderately-deep dicts with one mutation each.

The shape mirrors the JSON-like blobs the doc calls out: each item is
several layers deep with a mix of strings, ints, and nested lists.
"""
n = scale

def make(i: int, mutate: int) -> Dict[str, Any]:
return {
"id": i,
"name": "name-%d" % i,
"tags": ["t%d" % (i + j) for j in range(5)],
"details": {
"score": i + mutate,
"history": [{"step": j, "value": j * 2 + mutate} for j in range(4)],
"meta": {"created_at": "2024-01-%02d" % ((i % 28) + 1),
"owner": "user-%d" % (i % 17)},
},
}

t1 = [make(i, 0) for i in range(n)]
t2 = [make(i, 1 if i % 7 == 0 else 0) for i in range(n)]
return t1, t2, {"ignore_order": True, "cutoff_intersection_for_pairs": 1}


WORKLOADS: Dict[str, Callable[[int], Tuple[Any, Any, Dict[str, Any]]]] = {
"paired_subtree": workload_paired_subtree,
"distance_loop": workload_distance_loop,
"large_nested_dicts": workload_large_nested_dicts,
}


# ---------------------------------------------------------------------------
# Runner.
# ---------------------------------------------------------------------------


def _time(fn: Callable[[], Any]) -> Tuple[float, Any]:
start = time.perf_counter()
result = fn()
return time.perf_counter() - start, result


def run_one(name: str, scale: int, worker_counts: List[int]) -> List[Dict[str, Any]]:
"""Run one workload serial + parallel and return one row per worker count.

The serial result is computed once and reused as the correctness reference
for every parallel run.
"""
t1, t2, kwargs = WORKLOADS[name](scale)
print(f"\n=== {name} (scale={scale}) ===")
print(f"input shape: t1 has {len(t1)} items, t2 has {len(t2)} items")

serial_time, serial_result = _time(lambda: DeepDiff(t1, t2, **kwargs))
print(f"serial: {serial_time:.3f}s")

rows: List[Dict[str, Any]] = [{
"workload": name, "scale": scale,
"mode": "serial", "workers": 1,
"time_s": serial_time, "speedup": 1.0,
"ok": True,
}]

for workers in worker_counts:
parallel_time, parallel_result = _time(lambda: DeepDiff(
t1, t2,
multiprocessing=True,
multiprocessing_workers=workers,
multiprocessing_threshold=0,
**kwargs,
))
ok = parallel_result == serial_result
speedup = serial_time / parallel_time if parallel_time > 0 else float("inf")
marker = "" if ok else " !! RESULT MISMATCH !!"
print(f"parallel(workers={workers}): {parallel_time:.3f}s "
f"speedup={speedup:.2f}x{marker}")
rows.append({
"workload": name, "scale": scale,
"mode": "parallel", "workers": workers,
"time_s": parallel_time, "speedup": speedup,
"ok": ok,
})
return rows


def print_table(rows: List[Dict[str, Any]]) -> None:
"""Compact summary table at the end of the run."""
print("\n=== summary ===")
header = ("workload", "scale", "mode", "workers", "time_s", "speedup", "ok")
print("%-22s %6s %-9s %7s %10s %9s %4s" % header)
print("-" * 72)
for r in rows:
print("%-22s %6d %-9s %7d %10.3f %9.2f %4s" % (
r["workload"], r["scale"], r["mode"],
r["workers"], r["time_s"], r["speedup"],
"yes" if r["ok"] else "NO",
))


def main() -> int:
parser = argparse.ArgumentParser(description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument(
"--only", choices=list(WORKLOADS), action="append", default=None,
help="run only the named workload(s); may be repeated. Default: all.",
)
parser.add_argument(
"--workers", type=int, action="append", default=None,
help="explicit worker count to test; may be repeated. "
"Default: 2 and min(4, cpu_count).",
)
parser.add_argument(
"--scale", type=int, default=None,
help="override per-workload scale (number of items). Larger = more "
"wall time. Default: a per-workload value below.",
)
parser.add_argument(
"--quick", action="store_true",
help="use small scales for a fast sanity-check run.",
)
args = parser.parse_args()

workloads = args.only or list(WORKLOADS)
cpu = os.cpu_count() or 1
workers_list = args.workers or [2, min(4, cpu)]
# Deduplicate while preserving order — repeated --workers flags shouldn't
# cause duplicate rows.
workers_list = list(dict.fromkeys(workers_list))

# Default scales tuned so each row takes a few seconds serially. Override
# via --scale or --quick. These are starting points, not gospel.
default_scales = {
"paired_subtree": 200,
"distance_loop": 120,
"large_nested_dicts": 200,
}
quick_scales = {
"paired_subtree": 60,
"distance_loop": 40,
"large_nested_dicts": 60,
}
scales = quick_scales if args.quick else default_scales
if args.scale is not None:
scales = {name: args.scale for name in workloads}

print("DeepDiff multiprocessing benchmark")
print(f"cpu_count={cpu} workers tested={workers_list}")

all_rows: List[Dict[str, Any]] = []
for name in workloads:
all_rows.extend(run_one(name, scales[name], workers_list))

print_table(all_rows)

# Non-zero exit if any parallel run produced a different result than its
# serial reference — that's the one regression mode this script must catch.
if any(not r["ok"] for r in all_rows):
print("\nFAIL: at least one parallel run did not match its serial reference.")
return 1
return 0


if __name__ == "__main__":
sys.exit(main())
Loading
Loading