feat: fetch and store repo license via licensee IN-1105#4095
feat: fetch and store repo license via licensee IN-1105#4095gaspergrom wants to merge 7 commits intomainfrom
Conversation
|
|
There was a problem hiding this comment.
Pull request overview
This PR adds repository license detection to the git integration pipeline and persists the detected SPDX identifier into the main public.repositories table, enabling downstream consumers to query repository license metadata.
Changes:
- Adds a
licensecolumn topublic.repositories(with rollback migration). - Extends the git integration Docker image to install the
licenseegem and its libgit2 build/runtime dependencies. - Introduces
LicenseService(invokeslicensee detect --json) and wires it into the repository worker’s first-batch processing, persisting results via a new CRUD helper.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| backend/src/database/migrations/V1778154987__addLicenseToRepositories.sql | Adds license column to public.repositories. |
| backend/src/database/migrations/U1778154987__addLicenseToRepositories.sql | Drops license column on rollback. |
| scripts/services/docker/Dockerfile.git_integration | Installs Ruby + licensee and required libgit2/toolchain deps in the git integration image. |
| services/apps/git_integration/src/crowdgit/services/license/license_service.py | New async service to execute licensee and parse SPDX from JSON output. |
| services/apps/git_integration/src/crowdgit/services/license/init.py | Exports LicenseService from the license service module. |
| services/apps/git_integration/src/crowdgit/services/init.py | Re-exports LicenseService at the services package level. |
| services/apps/git_integration/src/crowdgit/worker/repository_worker.py | Runs license detection on first clone batch and writes the result to DB. |
| services/apps/git_integration/src/crowdgit/database/crud.py | Adds update_repository_license helper to persist SPDX ID. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
…N-1105 Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
b02ba60 to
58d4968
Compare
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e51b77c. Configure here.

Summary
licensecolumn (VARCHAR(255)) topublic.repositoriesvia a new migrationlicenseeRuby gem (v9.15.3, the last version compatible with Ruby 2.7 on Debian Bullseye) in the git integration Docker image, along withlibgit2build and runtime deps required by theruggedgemLicenseServicethat runslicensee detect --json <repo_path>and extracts the SPDX identifier from the JSON outputMIT,Apache-2.0,BSD-3-Clause) topublic.repositories.licensevia a newupdate_repository_licenseCRUD helperChanges
backend/src/database/migrations/V1778154987__addLicenseToRepositories.sql— addlicensecolumnbackend/src/database/migrations/U1778154987__addLicenseToRepositories.sql— undo migrationscripts/services/docker/Dockerfile.git_integration— install licensee v9.15.3 + libgit2 depsservices/apps/git_integration/src/crowdgit/services/license/license_service.py— new serviceservices/apps/git_integration/src/crowdgit/services/license/__init__.py— module initservices/apps/git_integration/src/crowdgit/services/__init__.py— export LicenseServiceservices/apps/git_integration/src/crowdgit/worker/repository_worker.py— wire serviceservices/apps/git_integration/src/crowdgit/database/crud.py— add update_repository_licenseNote
Medium Risk
Adds a new DB column and wires an external
licenseebinary into the git integration processing path, which could affect worker runtime behavior/image size and introduce new failure/performance modes during first-batch processing.Overview
Adds repository license persistence by introducing a nullable
public.repositories.licensecolumn (with forward/undo migrations) and plumbing it through the data-access layer queries.Extends the git-integration worker to install and run the Ruby
licenseegem inside its Docker image, adds a newLicenseServicethat extracts an SPDX identifier fromlicensee detect --json, and updates the repository worker to detect/update the license on the first clone batch via a newupdate_repository_licenseCRUD helper.Separately removes redundant try/catch wrappers in members enrichment activity helpers without changing behavior.
Reviewed by Cursor Bugbot for commit e51b77c. Bugbot is set up for automated code reviews on this repo. Configure here.