Skip to content

Guarddog v3: Custom correlation engine#706

Open
sobregosodd wants to merge 36 commits intomainfrom
s.obregoso/v3
Open

Guarddog v3: Custom correlation engine#706
sobregosodd wants to merge 36 commits intomainfrom
s.obregoso/v3

Conversation

@sobregosodd
Copy link
Copy Markdown
Contributor

@sobregosodd sobregosodd commented Apr 6, 2026

This PR introduces guarddog v3.
This version replaces GuardDog's independent-alert model with a risk correlation engine.

Highlights:

  • Risk score: Packages are now scored 0-10 based on attack chain completeness rather than individual pattern matches.
  • Capability + Threat = Risk: Findings only form risks when a code capability (e.g., network access) pairs with a threat indicator (e.g., suspicious domain) in the same category
  • MITRE ATT&CK mapping: Risks are formed and mapped to the matrix and this is leveraged to assign a risk score based on the risks present along the attack chain.
  • Semgrep → YARA: All source code rules migrated to language-agnostic YARA rules (34 rules, 4 reusable .meta files)
  • Metadata in the risk model: Metadata rules play a similar role in the risk scoring by having severity grading and MITRE mapping.

Other changes:

  • Metadata rules that are loosely related to maliciousness were dropped to reduce noise.

@sobregosodd sobregosodd requested a review from a team as a code owner April 6, 2026 20:57
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 143 out of 144 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/analyzer/metadata/test_repository_integrity_mismatch.py
Comment thread guarddog/analyzer/sourcecode/threat-runtime-obfuscation.yar
Comment thread guarddog/analyzer/sourcecode/threat-runtime-obfuscation-unicode.yar
Comment thread guarddog/analyzer/sourcecode/threat-runtime-obfuscation-general.yar
tesnim5hamdouni
tesnim5hamdouni previously approved these changes Apr 7, 2026
Copy link
Copy Markdown
Contributor

@tesnim5hamdouni tesnim5hamdouni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

letsgoooo

ikretz
ikretz previously approved these changes Apr 7, 2026
Comment thread guarddog/analyzer/risk_engine.py
@christophetd christophetd dismissed stale reviews from ikretz and tesnim5hamdouni via acf7b4e April 10, 2026 20:11
sobregosodd and others added 12 commits April 10, 2026 18:02
When scanning a local directory, metadata detectors (typosquatting,
deceptive author, compromised email, etc.) previously could not run
because no registry metadata was available. The new --metadata flag
accepts a path to a package metadata JSON file (matching the PyPI
JSON API or npm registry format), enabling the full detection
pipeline for local scans.

The recall benchmark worker now automatically passes package_info-*.json
files from the malicious-software-packages-dataset ZIPs to guarddog
via this flag, so metadata rules contribute to recall measurement.
cluster.py now identifies ZIPs with zero source files and records them
in cluster_index.json under "empty_packages". recall.py filters these
out during regenerate_samples so benchmarks don't waste budget on
packages that have no code to analyze.
Packages like litellm ship as a ZIP containing another ZIP. These are
not empty; they just need double extraction. Count nested archives
(.zip, .whl, .tar.gz) as having content.
Add 12 new YARA threat rules targeting common malware patterns that were
previously undetected: download-and-execute chains, chr/hex obfuscation,
PowerShell encoded commands, dynamic import+exec, reverse shells,
Telegram/Discord exfil, DNS exfil, npm preinstall hooks, dependency
confusion indicators, setup.py suspicious imports, and system info
exfiltration.

Tighten 6 existing rules to reduce false positives: threat-process-hooks
(inline meta rules, exclude prepare/prepack), threat-process-injection-dll
(remove overly broad .dll/.exe string matches), threat-runtime-system-info
(require 3+ calls instead of 1), threat-process-spawn-silent (require both
stdout+stderr suppressed), threat-runtime-obfuscation-general (raise hex
threshold to 50+, remove bracket notation), threat-runtime-obfuscation-base64exec
(tighten JS Buffer.from pattern, require explicit base64 encoding).

Update risk engine: add "setup" and "npm" to valid categories (was silently
dropping findings), make HIGH-specificity threats form standalone risks,
add cross-category risk formation, add specificity gate (LOW-specificity-only
capped at 4.9 unless MEDIUM+ specificity present), bump single-stage chain
value from 0.3 to 0.4.

Benchmark results (threshold 5.0, 1000 benign + 745 malicious packages):

|          | Baseline | Final  | Change  |
|----------|----------|--------|---------|
| Recall   | 79.3%    | 87.0%  | +7.7pp  |
| Precision| 75.5%    | 80.2%  | +4.7pp  |
| F1       | 77.3%    | 83.5%  | +6.2pp  |
| MCC      | 0.600    | 0.704  | +0.104  |
…lusters

Removed 95 packages that had no source files (empty placeholders, dep
confusion probes with no payload, nested-archive dataset bugs). Backfilled
94 replacements from previously unrepresented clusters for better diversity.

Recall on cleaned dataset: 88.8% (was 87.0% on dirty dataset with empty
packages dragging it down). PyPI recall 98.9%, compromised_lib 100%.
Add threat-runtime-obfuscation-log-suppress rule for console.log
suppression combined with hex arrays/fromCharCode (common npm malware
evasion). Extend threat-process-download-exec to catch Node.js
child_process + fetch patterns.

Recall now at 90.6% (threshold 5.0), up from 88.8%.
extension_scanner.scan_local was missing the info parameter added to
the PackageScanner superclass, causing a mypy override error.

test_single_runtime_threat expected 8.6 but single-stage chain value
was changed from 0.3 to 0.4, making the correct score 8.8.
Replace per-package GitHub Contents API calls (~2 per package) with
bulk Git Trees API (~10 calls total). Fixes sampling failures from
rate limiting when resolving 1000+ packages.

Add ASCII pipeline diagram to evals/README.md showing the full
cluster -> sample -> scan -> report workflow.
Build the ZIP index upfront and only sample from packages that
actually have ZIPs in the dataset. Fixes resolution failures when
the manifest lists packages that don't have archived samples yet.

Resampled: 1251 packages (251 pypi + 1000 npm), 1 per cluster,
max diversity.
The dataset stores scoped npm packages with @ as separator
(@0xengine@meow) but the manifest uses / (@0xengine/meow).
Convert when parsing tree paths so scoped packages are found
during sampling. This adds ~1359 previously invisible packages
to the available pool.
Copy link
Copy Markdown
Contributor

Detection Benchmark Results

Benchmark comparing detection quality before (e2a21fd) and after the latest rule + risk engine changes. Evaluated on 1,225 malicious packages (1 per cluster, from the malicious-software-packages-dataset) and 1,000 top legitimate packages, at detection threshold 5.0.

Aggregate

Metric Before After Change
Recall 88.5% 90.2% +1.7pp
Precision 70.2% 86.4% +16.2pp
F1 78.3% 88.3% +10.0pp
MCC 0.453 0.733 +0.280
FPs (benign >= 5.0) 467 174 -293

By ecosystem

PyPI npm
Before After Before After
Recall 79.8% 85.7% 90.7% 91.3%
Precision 38.3% 61.2% 85.9% 94.9%
F1 51.8% 71.4% 88.2% 93.0%
MCC 0.163 0.567 0.630 0.801
FPs 319 125 148 49

By category

Category Before After
malicious_intent 84.8% 87.4%
compromised_lib 95.1% 95.1%

What changed

12 new YARA threat rules targeting previously undetected patterns: download-and-execute chains, chr/hex obfuscation, PowerShell encoded commands, dynamic import+exec, reverse shells, Telegram/Discord exfil, DNS exfil, npm preinstall hooks, setup.py suspicious imports, system info exfiltration, and log suppression + obfuscation.

6 tightened existing rules to reduce FP rate: threat-process-hooks (inlined), threat-process-injection-dll, threat-runtime-system-info, threat-process-spawn-silent, threat-runtime-obfuscation-general, threat-runtime-obfuscation-base64exec.

Risk engine improvements: added "setup"/"npm" categories (was silently dropping findings), standalone risk formation for HIGH-specificity threats, cross-category risk pairing, specificity gate (LOW-specificity-only capped below threshold), single-stage chain value bump.

Eval infrastructure: cluster-aware sampling with empty-package filtering, scoped npm package support, bulk Git Trees API for ZIP resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants