Skip to content

fix: normalize git URLs in npm_metadata_mismatch to avoid false positives#703

Open
4444J99 wants to merge 1 commit intoDataDog:mainfrom
4444J99:4444J99/fix-git-url-normalization
Open

fix: normalize git URLs in npm_metadata_mismatch to avoid false positives#703
4444J99 wants to merge 1 commit intoDataDog:mainfrom
4444J99:4444J99/fix-git-url-normalization

Conversation

@4444J99
Copy link
Copy Markdown

@4444J99 4444J99 commented Mar 28, 2026

Summary

Fixes #634

The npm_metadata_mismatch heuristic compares dependency values between the NPM registry manifest and the local package.json using strict string equality. This causes false positives when both sources reference the same git repository but one includes the trailing .git suffix and the other does not:

Manifest("git+https://github.com/whiskeysockets/libsignal-node.git")
package.json("git+https://github.com/whiskeysockets/libsignal-node")

These are semantically equivalent — git resolves both to the same repository.

Changes

  • Add _normalize_git_url() in npm_metadata_mismatch.py that strips the trailing .git suffix from git-scheme URLs (git://, git+https://, git+ssh://, git+file://) before comparison
  • Apply normalization in diff_at_key_dict() so comparisons use normalized values while preserving the original values in diff output
  • Non-git dependency values (semver ranges, tarball URLs, etc.) pass through unchanged

Tests

10 new tests added to test_npm_metadata_mismatch.py:

  • Integration test reproducing the exact scenario from npm_metadata_mismatch fails to detect the git scheme #634 (libsignal with .git suffix mismatch)
  • Unit tests for _normalize_git_url() covering git+https, git+ssh, bare git://, no-suffix passthrough, non-git URLs, None, and non-string values
  • diff_at_key_dict tests verifying equivalent git URLs produce no diff while genuinely different git URLs are still detected

All 114 metadata tests pass (10 existing + 10 new for this heuristic, 94 others unchanged).

…ives

The npm_metadata_mismatch heuristic compared dependency URLs with strict
string equality. This caused false positives when the NPM manifest and
package.json listed the same git repository with and without the trailing
`.git` suffix (e.g. `git+https://…/repo.git` vs `git+https://…/repo`).

Adds `_normalize_git_url()` to strip the `.git` suffix from git-scheme
URLs before comparison, and 10 new tests covering the normalization
logic, integration behavior, and edge cases.

Fixes DataDog#634
@4444J99 4444J99 requested a review from a team as a code owner March 28, 2026 15:13
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 54d5ecc421

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if not isinstance(value, str):
return value
if _GIT_URL_PATTERN.match(value):
return value.removesuffix(".git")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize .git before URL fragment

Update _normalize_git_url to handle git URLs with a commit/tag suffix (for example git+https://host/org/repo.git#v1.2.3). The current implementation only strips .git when it is the very end of the full string, so equivalent dependency specs like .git#... vs #... still compare as different and continue to trigger false-positive mismatches in diff_at_key_dict. Parsing the URL and removing .git from the repository path (before fragment/query) would fix this common npm git dependency form.

Useful? React with 👍 / 👎.

@4444J99
Copy link
Copy Markdown
Author

4444J99 commented Apr 21, 2026

Hi — checking in on this. The PR normalizes git URLs in the npm_metadata_mismatch heuristic to avoid false positives when one source includes a trailing .git suffix and the other doesn't (exact scenario from #634). 10 new tests, all 114 metadata tests pass. Happy to rebase or address feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

npm_metadata_mismatch fails to detect the git scheme

1 participant