feat: vulnerability scanning within git integration (IN-956)#3892
feat: vulnerability scanning within git integration (IN-956)#3892
Conversation
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
1 similar comment
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
|
related: linuxfoundation/insights#1725 |
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
1 similar comment
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
...git_integration/src/crowdgit/services/vulnerability_scanner/vulnerability_scanner_service.py
Show resolved
Hide resolved
...es/apps/git_integration/src/crowdgit/services/vulnerability_scanner/vulnerability_scanner.go
Show resolved
Hide resolved
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
2 similar comments
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/main.go
Show resolved
Hide resolved
...git_integration/src/crowdgit/services/vulnerability_scanner/vulnerability_scanner_service.py
Show resolved
Hide resolved
| conn = await asyncpg.connect( | ||
| user=os.environ["INSIGHTS_DB_USERNAME"], | ||
| password=os.environ["INSIGHTS_DB_PASSWORD"], | ||
| database=os.environ["INSIGHTS_DB_DATABASE"], | ||
| host=os.environ["INSIGHTS_DB_WRITE_HOST"], | ||
| port=int(os.environ.get("INSIGHTS_DB_PORT", "5432")), | ||
| ) | ||
| try: |
There was a problem hiding this comment.
In case of connection errors, it won't be caught, shouldn't we include it in the try block ?
| conn = await asyncpg.connect( | |
| user=os.environ["INSIGHTS_DB_USERNAME"], | |
| password=os.environ["INSIGHTS_DB_PASSWORD"], | |
| database=os.environ["INSIGHTS_DB_DATABASE"], | |
| host=os.environ["INSIGHTS_DB_WRITE_HOST"], | |
| port=int(os.environ.get("INSIGHTS_DB_PORT", "5432")), | |
| ) | |
| try: | |
| try: | |
| conn = await asyncpg.connect( | |
| user=os.environ["INSIGHTS_DB_USERNAME"], | |
| password=os.environ["INSIGHTS_DB_PASSWORD"], | |
| database=os.environ["INSIGHTS_DB_DATABASE"], | |
| host=os.environ["INSIGHTS_DB_WRITE_HOST"], | |
| port=int(os.environ.get("INSIGHTS_DB_PORT", "5432")), | |
| ) |
...git_integration/src/crowdgit/services/vulnerability_scanner/vulnerability_scanner_service.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Adds an automated vulnerability-scanning step to the git integration worker, implemented as a Go OSV-Scanner-based binary invoked from Python, with results persisted to the insights DB.
Changes:
- Run a new
VulnerabilityScannerServiceon the first clone batch and record an execution viaOperationType.VULNERABILITY_SCAN. - Introduce a new Go-based
vulnerability-scannermodule/binary (OSV Scanner SDK) plus Docker build plumbing. - Extend
run_shell_commandto propagate return codes and optionally stream stderr.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| services/apps/git_integration/src/crowdgit/worker/repository_worker.py | Invokes vulnerability scan on first clone batch. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/vulnerability_scanner_service.py | Python wrapper for scanner subprocess + execution tracking + stale scan cleanup. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/vulnerability_scanner.go | Core Go scanning logic + normalization + DB persistence. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/types.go | Shared response / DB model types for scanner. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/main.go | CLI entrypoint + JSON stdout formatting. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/go.mod | Go module definition for scanner. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/go.sum | Go dependency lockfile for scanner. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/db.go | Insights DB connection + upsert/resolve strategy + scan tracking. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/config.go | Reads target path + insights DB env configuration. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/README.md | Design/behavior documentation for scanner component. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/.gitignore | Ignores local Go build artifacts. |
| services/apps/git_integration/src/crowdgit/services/utils.py | Adds stderr streaming + returncode propagation in run_shell_command. |
| services/apps/git_integration/src/crowdgit/services/init.py | Exposes VulnerabilityScannerService. |
| services/apps/git_integration/src/crowdgit/server.py | Wires scanner service into app lifecycle / worker init. |
| services/apps/git_integration/src/crowdgit/errors.py | Adds returncode field to CommandExecutionError. |
| services/apps/git_integration/src/crowdgit/enums.py | Adds OperationType.VULNERABILITY_SCAN. |
| scripts/services/docker/Dockerfile.git_integration | Builds + ships vulnerability-scanner binary in the image. |
| backend/.env.dist.local | Adds local insights DB env vars. |
| backend/.env.dist.composed | Adds composed insights DB host env var. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/README.md
Show resolved
Hide resolved
...git_integration/src/crowdgit/services/vulnerability_scanner/vulnerability_scanner_service.py
Outdated
Show resolved
Hide resolved
| conn = await asyncpg.connect( | ||
| user=os.environ["INSIGHTS_DB_USERNAME"], | ||
| password=os.environ["INSIGHTS_DB_PASSWORD"], | ||
| database=os.environ["INSIGHTS_DB_DATABASE"], | ||
| host=os.environ["INSIGHTS_DB_WRITE_HOST"], | ||
| port=int(os.environ.get("INSIGHTS_DB_PORT", "5432")), |
| config.InsightsDatabase.User = os.Getenv("INSIGHTS_DB_USERNAME") | ||
| config.InsightsDatabase.Password = os.Getenv("INSIGHTS_DB_PASSWORD") | ||
| config.InsightsDatabase.DBName = os.Getenv("INSIGHTS_DB_DATABASE") | ||
| config.InsightsDatabase.Host = os.Getenv("INSIGHTS_DB_WRITE_HOST") | ||
| if portStr := os.Getenv("INSIGHTS_DB_PORT"); portStr != "" { | ||
| if port, err := strconv.Atoi(portStr); err == nil { | ||
| config.InsightsDatabase.Port = port | ||
| } | ||
| } | ||
| config.InsightsDatabase.SSLMode = os.Getenv("INSIGHTS_DB_SSLMODE") | ||
| if poolMaxStr := os.Getenv("INSIGHTS_DB_POOL_MAX"); poolMaxStr != "" { | ||
| if poolMax, err := strconv.Atoi(poolMaxStr); err == nil { | ||
| config.InsightsDatabase.PoolMax = poolMax | ||
| } | ||
| } |
|
|
5f955ac to
9b7b58e
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 6 total unresolved issues (including 4 from previous reviews).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
services/apps/merge_suggestions_worker/src/memberSimilarityCalculator.ts
Show resolved
Hide resolved
Signed-off-by: anil <epipav@gmail.com>
9b7b58e to
706ee5c
Compare
Signed-off-by: anil <epipav@gmail.com>
Signed-off-by: anil <epipav@gmail.com>

Adds automated vulnerability scanning for all git repositories using the Google OSV Scanner SDK. Runs on the first clone batch per repo and persists results directly to the insights database.
Architecture
Go binary wrapped in Python — OSV Scanner is a Go library with no Python bindings. We embed it as an SDK dependency and call it programmatically, following the same subprocess + JSON stdout pattern as the software-value service.
The binary exits with code 0 and communicates errors through the JSON payload, so the Python subprocess machinery never misinterprets a non-zero exit as a crash.
Design decisions
Vulnerability identity: (repo_url, vulnerability_id, package_name, source_path) — same CVE can appear in multiple packages and lockfiles
ID classification: primary ID + aliases sorted into cve_ids, ghsa_ids, other_ids arrays by prefix
Severity: derived from CVSS numeric score using standard thresholds (CRITICAL/HIGH/MEDIUM/LOW)
Status tracking: OPEN (no fix known), FIX_AVAILABLE (patch exists), RESOLVED (no longer detected)
Database strategy: upsert + mark-resolved (not delete + insert) — preserves full history of when vulnerabilities were first detected, last seen, and resolved
Transitive scanning: resolves full dependency graph by default; falls back to direct-only on timeout (3min) for first scans; subsequent scans reuse the previous mode
OOM handling: on any scanner crash, marks stale running scan records as failure; on OOM specifically (SIGKILL), retries with --no-transitive to skip the most memory-intensive part
Scan tracking: every invocation creates a vulnerability_scans row (running → success/failure/no_packages_found) with duration, counts, and errors
Note
Medium Risk
Introduces a new repo-scanning stage that executes a Go binary and writes/upserts vulnerability data into the insights database, plus new Tinybird datasources/pipes. Risk is moderate due to new DB write paths, subprocess execution, and timeout/OOM retry behavior affecting repo processing throughput and data correctness.
Overview
Adds automated vulnerability scanning to the git integration worker: the first clone batch now runs a new
VulnerabilityScannerServicethat shells out to a bundledvulnerability-scannerGo binary, parses JSON status, retries on timeout/OOM with--no-transitive, and records aServiceExecutionentry.Introduces the new Go scanner service (built into the Docker image) that runs OSV Scanner SDK with a 3-minute timeout, tracks scan runs in
vulnerability_scans, and upserts findings intovulnerabilitiesusing a mark-resolved then upsert strategy (including severity/ID normalization and transitive-scan mode persistence).Adds insights DB env vars to env templates and adds Tinybird
vulnerabilities/vulnerability_scansdatasources plus pipes for summary, list, and breakdowns by severity/ecosystem.Written by Cursor Bugbot for commit a5930b0. This will update automatically on new commits. Configure here.