fix: normalize git URLs in npm_metadata_mismatch to avoid false positives#703
fix: normalize git URLs in npm_metadata_mismatch to avoid false positives#7034444J99 wants to merge 1 commit intoDataDog:mainfrom
Conversation
…ives The npm_metadata_mismatch heuristic compared dependency URLs with strict string equality. This caused false positives when the NPM manifest and package.json listed the same git repository with and without the trailing `.git` suffix (e.g. `git+https://…/repo.git` vs `git+https://…/repo`). Adds `_normalize_git_url()` to strip the `.git` suffix from git-scheme URLs before comparison, and 10 new tests covering the normalization logic, integration behavior, and edge cases. Fixes DataDog#634
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 54d5ecc421
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if not isinstance(value, str): | ||
| return value | ||
| if _GIT_URL_PATTERN.match(value): | ||
| return value.removesuffix(".git") |
There was a problem hiding this comment.
Normalize .git before URL fragment
Update _normalize_git_url to handle git URLs with a commit/tag suffix (for example git+https://host/org/repo.git#v1.2.3). The current implementation only strips .git when it is the very end of the full string, so equivalent dependency specs like .git#... vs #... still compare as different and continue to trigger false-positive mismatches in diff_at_key_dict. Parsing the URL and removing .git from the repository path (before fragment/query) would fix this common npm git dependency form.
Useful? React with 👍 / 👎.
|
Hi — checking in on this. The PR normalizes git URLs in the |
Summary
Fixes #634
The
npm_metadata_mismatchheuristic compares dependency values between the NPM registry manifest and the localpackage.jsonusing strict string equality. This causes false positives when both sources reference the same git repository but one includes the trailing.gitsuffix and the other does not:These are semantically equivalent — git resolves both to the same repository.
Changes
_normalize_git_url()innpm_metadata_mismatch.pythat strips the trailing.gitsuffix from git-scheme URLs (git://,git+https://,git+ssh://,git+file://) before comparisondiff_at_key_dict()so comparisons use normalized values while preserving the original values in diff outputTests
10 new tests added to
test_npm_metadata_mismatch.py:libsignalwith.gitsuffix mismatch)_normalize_git_url()coveringgit+https,git+ssh, baregit://, no-suffix passthrough, non-git URLs,None, and non-string valuesdiff_at_key_dicttests verifying equivalent git URLs produce no diff while genuinely different git URLs are still detectedAll 114 metadata tests pass (10 existing + 10 new for this heuristic, 94 others unchanged).