Skip to content

feat: vulnerability scanning within git integration (IN-956)#3892

Merged
epipav merged 4 commits intomainfrom
feat/git-osv-vulnerabilities
Mar 20, 2026
Merged

feat: vulnerability scanning within git integration (IN-956)#3892
epipav merged 4 commits intomainfrom
feat/git-osv-vulnerabilities

Conversation

@epipav
Copy link
Collaborator

@epipav epipav commented Mar 3, 2026

Adds automated vulnerability scanning for all git repositories using the Google OSV Scanner SDK. Runs on the first clone batch per repo and persists results directly to the insights database.

Architecture
Go binary wrapped in Python — OSV Scanner is a Go library with no Python bindings. We embed it as an SDK dependency and call it programmatically, following the same subprocess + JSON stdout pattern as the software-value service.

The binary exits with code 0 and communicates errors through the JSON payload, so the Python subprocess machinery never misinterprets a non-zero exit as a crash.

Design decisions
Vulnerability identity: (repo_url, vulnerability_id, package_name, source_path) — same CVE can appear in multiple packages and lockfiles

ID classification: primary ID + aliases sorted into cve_ids, ghsa_ids, other_ids arrays by prefix

Severity: derived from CVSS numeric score using standard thresholds (CRITICAL/HIGH/MEDIUM/LOW)

Status tracking: OPEN (no fix known), FIX_AVAILABLE (patch exists), RESOLVED (no longer detected)

Database strategy: upsert + mark-resolved (not delete + insert) — preserves full history of when vulnerabilities were first detected, last seen, and resolved

Transitive scanning: resolves full dependency graph by default; falls back to direct-only on timeout (3min) for first scans; subsequent scans reuse the previous mode

OOM handling: on any scanner crash, marks stale running scan records as failure; on OOM specifically (SIGKILL), retries with --no-transitive to skip the most memory-intensive part

Scan tracking: every invocation creates a vulnerability_scans row (running → success/failure/no_packages_found) with duration, counts, and errors


Note

Medium Risk
Introduces a new repo-scanning stage that executes a Go binary and writes/upserts vulnerability data into the insights database, plus new Tinybird datasources/pipes. Risk is moderate due to new DB write paths, subprocess execution, and timeout/OOM retry behavior affecting repo processing throughput and data correctness.

Overview
Adds automated vulnerability scanning to the git integration worker: the first clone batch now runs a new VulnerabilityScannerService that shells out to a bundled vulnerability-scanner Go binary, parses JSON status, retries on timeout/OOM with --no-transitive, and records a ServiceExecution entry.

Introduces the new Go scanner service (built into the Docker image) that runs OSV Scanner SDK with a 3-minute timeout, tracks scan runs in vulnerability_scans, and upserts findings into vulnerabilities using a mark-resolved then upsert strategy (including severity/ID normalization and transitive-scan mode persistence).

Adds insights DB env vars to env templates and adds Tinybird vulnerabilities/vulnerability_scans datasources plus pipes for summary, list, and breakdowns by severity/ecosystem.

Written by Cursor Bugbot for commit a5930b0. This will update automatically on new commits. Configure here.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@epipav epipav marked this pull request as draft March 3, 2026 16:44
@epipav epipav marked this pull request as ready for review March 5, 2026 14:48
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@epipav epipav requested a review from mbani01 March 5, 2026 14:52
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@epipav
Copy link
Collaborator Author

epipav commented Mar 5, 2026

related: linuxfoundation/insights#1725

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

2 similar comments
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@epipav epipav changed the title feat: vulnerability scanning within git integration feat: vulnerability scanning within git integration (IN-956) Mar 5, 2026
Comment on lines +102 to +109
conn = await asyncpg.connect(
user=os.environ["INSIGHTS_DB_USERNAME"],
password=os.environ["INSIGHTS_DB_PASSWORD"],
database=os.environ["INSIGHTS_DB_DATABASE"],
host=os.environ["INSIGHTS_DB_WRITE_HOST"],
port=int(os.environ.get("INSIGHTS_DB_PORT", "5432")),
)
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of connection errors, it won't be caught, shouldn't we include it in the try block ?

Suggested change
conn = await asyncpg.connect(
user=os.environ["INSIGHTS_DB_USERNAME"],
password=os.environ["INSIGHTS_DB_PASSWORD"],
database=os.environ["INSIGHTS_DB_DATABASE"],
host=os.environ["INSIGHTS_DB_WRITE_HOST"],
port=int(os.environ.get("INSIGHTS_DB_PORT", "5432")),
)
try:
try:
conn = await asyncpg.connect(
user=os.environ["INSIGHTS_DB_USERNAME"],
password=os.environ["INSIGHTS_DB_PASSWORD"],
database=os.environ["INSIGHTS_DB_DATABASE"],
host=os.environ["INSIGHTS_DB_WRITE_HOST"],
port=int(os.environ.get("INSIGHTS_DB_PORT", "5432")),
)

mbani01
mbani01 previously approved these changes Mar 6, 2026
Copilot AI review requested due to automatic review settings March 18, 2026 09:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an automated vulnerability-scanning step to the git integration worker, implemented as a Go OSV-Scanner-based binary invoked from Python, with results persisted to the insights DB.

Changes:

  • Run a new VulnerabilityScannerService on the first clone batch and record an execution via OperationType.VULNERABILITY_SCAN.
  • Introduce a new Go-based vulnerability-scanner module/binary (OSV Scanner SDK) plus Docker build plumbing.
  • Extend run_shell_command to propagate return codes and optionally stream stderr.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
services/apps/git_integration/src/crowdgit/worker/repository_worker.py Invokes vulnerability scan on first clone batch.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/vulnerability_scanner_service.py Python wrapper for scanner subprocess + execution tracking + stale scan cleanup.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/vulnerability_scanner.go Core Go scanning logic + normalization + DB persistence.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/types.go Shared response / DB model types for scanner.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/main.go CLI entrypoint + JSON stdout formatting.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/go.mod Go module definition for scanner.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/go.sum Go dependency lockfile for scanner.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/db.go Insights DB connection + upsert/resolve strategy + scan tracking.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/config.go Reads target path + insights DB env configuration.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/README.md Design/behavior documentation for scanner component.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/.gitignore Ignores local Go build artifacts.
services/apps/git_integration/src/crowdgit/services/utils.py Adds stderr streaming + returncode propagation in run_shell_command.
services/apps/git_integration/src/crowdgit/services/init.py Exposes VulnerabilityScannerService.
services/apps/git_integration/src/crowdgit/server.py Wires scanner service into app lifecycle / worker init.
services/apps/git_integration/src/crowdgit/errors.py Adds returncode field to CommandExecutionError.
services/apps/git_integration/src/crowdgit/enums.py Adds OperationType.VULNERABILITY_SCAN.
scripts/services/docker/Dockerfile.git_integration Builds + ships vulnerability-scanner binary in the image.
backend/.env.dist.local Adds local insights DB env vars.
backend/.env.dist.composed Adds composed insights DB host env var.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +102 to +107
conn = await asyncpg.connect(
user=os.environ["INSIGHTS_DB_USERNAME"],
password=os.environ["INSIGHTS_DB_PASSWORD"],
database=os.environ["INSIGHTS_DB_DATABASE"],
host=os.environ["INSIGHTS_DB_WRITE_HOST"],
port=int(os.environ.get("INSIGHTS_DB_PORT", "5432")),
Comment on lines +38 to +52
config.InsightsDatabase.User = os.Getenv("INSIGHTS_DB_USERNAME")
config.InsightsDatabase.Password = os.Getenv("INSIGHTS_DB_PASSWORD")
config.InsightsDatabase.DBName = os.Getenv("INSIGHTS_DB_DATABASE")
config.InsightsDatabase.Host = os.Getenv("INSIGHTS_DB_WRITE_HOST")
if portStr := os.Getenv("INSIGHTS_DB_PORT"); portStr != "" {
if port, err := strconv.Atoi(portStr); err == nil {
config.InsightsDatabase.Port = port
}
}
config.InsightsDatabase.SSLMode = os.Getenv("INSIGHTS_DB_SSLMODE")
if poolMaxStr := os.Getenv("INSIGHTS_DB_POOL_MAX"); poolMaxStr != "" {
if poolMax, err := strconv.Atoi(poolMaxStr); err == nil {
config.InsightsDatabase.PoolMax = poolMax
}
}
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@epipav epipav force-pushed the feat/git-osv-vulnerabilities branch from 5f955ac to 9b7b58e Compare March 19, 2026 12:37
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 6 total unresolved issues (including 4 from previous reviews).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Signed-off-by: anil <epipav@gmail.com>
@epipav epipav force-pushed the feat/git-osv-vulnerabilities branch from 9b7b58e to 706ee5c Compare March 19, 2026 12:54
@epipav epipav merged commit d2e767e into main Mar 20, 2026
10 checks passed
@epipav epipav deleted the feat/git-osv-vulnerabilities branch March 20, 2026 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants