chore: reorganize dedupe code#14641
chore: reorganize dedupe code#14641valentijnscholten wants to merge 12 commits intoDefectDojo:devfrom
Conversation
Expose match_batch_* and match_batch_of_findings for read-only matching. Support unsaved findings in location/endpoint comparison and _is_candidate_older. Refactor default_importer close_old_findings to use get_close_old_findings_queryset. Restore batch deduplication debug logging.
…m_db queries. Replace per-finding refresh_from_db(false_p, risk_accepted, out_of_scope) with one values() query for all PKs and assign onto instances, falling back to refresh_from_db when a row is missing.
…r for performance Passing tags= directly to the Finding() constructor triggers expensive tagulous processing for every finding. Using finding.unsaved_tags instead bypasses this overhead and lets the import pipeline handle tags efficiently. Affected parsers: jfrog_xray_unified, dependency_check, cargo_audit, anchore_grype, threat_composer. Benchmark on 14,219 findings: 99s -> 7.97s (12x faster).
…ring Update tests for dependency_check and jfrog_xray_unified parsers to match the actual list format returned by unsaved_tags, and fix the expected order of tags for the suppressed-without-notes case in dependency_check.
…dings Tags from the report were being appended to matched findings via tags.add(), causing tags to accumulate across reimports instead of being left unchanged. This aligns tag handling with how other finding fields are treated on reimport. Closes DefectDojo#14606
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
Conflicts have been resolved. A maintainer will review the pull request shortly. |
|
This pull request introduces two high-severity potential cross-site scripting issues where code in dojo/tools/anchore_grype/parser.py and dojo/tools/cargo_audit/parser.py constructs Markdown/HTML-like strings by directly interpolating unescaped advisory and vulnerability fields (URLs, descriptions, categories, function names/versions, references), which could allow attacker-controlled input to inject malicious markup/JS when later rendered. These findings should be addressed by properly validating or escaping user-controlled data or sanitizing/escaping output before rendering.
🟠 Potential Cross-Site Scripting in
|
| Vulnerability | Potential Cross-Site Scripting |
|---|---|
| Description | The code builds markdown-like strings by directly interpolating variables (vuln_datasource, vuln_urls, rel_datasource, rel_urls) into finding_references without any escaping or sanitization. If any of these values can contain attacker-controlled input and are later rendered into an HTML context without escaping (or rendered as raw HTML), this allows injection of malicious markup/JS (XSS). |
django-DefectDojo/dojo/tools/anchore_grype/parser.py
Lines 144 to 147 in 1b18585
🟠 Potential Cross-Site Scripting in dojo/tools/cargo_audit/parser.py (drs_a8c81440)
| Vulnerability | Potential Cross-Site Scripting |
|---|---|
| Description | The patch builds Markdown/HTML-like strings by interpolating advisory fields (description, categories, affected function names/versions, references) directly into formatted text without any escaping or sanitization. If those advisory fields can contain attacker-controlled input and are later rendered into HTML with auto-escaping disabled (or converted from Markdown to HTML without sanitization), this can lead to XSS. |
django-DefectDojo/dojo/tools/cargo_audit/parser.py
Lines 83 to 86 in 1b18585
Comment to provide feedback on these findings.
Report false positive: @dryrunsecurity fp [FINDING ID] [FEEDBACK]
Report low-impact: @dryrunsecurity nit [FINDING ID] [FEEDBACK]
Example: @dryrunsecurity fp drs_90eda195 This code is not user-facing
All finding details can be found in the DryRun Security Dashboard.
Summary
Refactors deduplication code to allow for extensions in Pro.
It has some other open PRs merged in to it to be able to do representative tests and performance measurements.