Refactor: link metadata via SPDX submodule + automation#48
Conversation
This repo now owns only keyword definitions per algorithm id; metadata (name, OID, cryptoClass, key sizes, references) lives in the SPDX cryptographic-algorithm-list, joined via the algorithm id. Changes: - Add SPDX repo as Git submodule at external/spdx-crypto/ (pinned). - Migrate definitions_crypto_algorithms/definitions/<cat>/<id>.yaml to flat keywords/<id>.yaml (preserves history via git mv). - Normalize CMAC -> cmac to match the canonical SPDX id. - Delete duplicated metadata in definitions_crypto_algorithms/algorithms/ and the now-empty definitions/ tree. - Replace spec_crypto_algorithms with a slim spec/ describing the keyword-only YAML format. - Add scripts/data_io.py, scripts/validate_consistency.py and scripts/build_linked_dataset.py. The build script is the sole writer of dist/. - Add dist/crypto_algorithms_keywords.json (keywords by id), dist/crypto_algorithms_linked.json (SPDX metadata + keywords for linked ids only), dist/spdx_snapshot.json (diff baseline) and dist/inconsistency_report.json (validator output). - Stub keywords/<id>.yaml for ids present in SPDX but not yet here: X509, blakex, chacha, diffiehellman, shax. - Adapt utilities/crypto_detect.py to the new layout (loads keywords/ and reads metadata from the SPDX submodule). - Add pytest fixtures + unit tests for both scripts. - Rewrite README and utilities/README to document the new model. Validator state: 211 keyword files, 127 SPDX entries, 127 linked, 0 case collisions, 0 body-id mismatches, 84 keywords-only, 0 SPDX-only. Inconsistencies are reported in dist/inconsistency_report.json. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the tooling and CI workflows required to keep keywords/, dist/ and
the downstream platform_deployment in sync as either side evolves.
Scripts:
- scripts/detect_spdx_changes.py — diffs the current SPDX submodule
state against dist/spdx_snapshot.json; emits {added, modified,
removed} JSON. Exits 1 on any change so workflows can branch.
- scripts/generate_sql.py — emits INSERT/UPDATE SQL files for the
crypto_algorithms_metadata table (conservative schema: id, name,
oid, crypto_class, common_key_size, specified_key_size, reference).
One file per change plus a combined <ts>__all.sql.
Workflows (under .github/workflows/):
- keywords-changed.yml — runs on push to main under keywords/**;
runs tests, validator, builder, opens PR if dist/ diff.
- spdx-changed.yml — runs daily and on workflow_dispatch; supports
spdx_repo_url and spdx_ref inputs for testing against a fork;
detects upstream changes, creates keyword stubs for added ids,
regenerates dist/, generates SQL, opens a single PR with all
changes.
- propagate-to-platform-deployment.yml — fires on push to main
touching dist/ or sql/; copies generated artifacts into
scanoss/platform_deployment and opens a PR there. Requires
PLATFORM_DEPLOYMENT_TOKEN secret.
Tests:
- tests/test_detect_spdx_changes.py — diff classification + CLI exit codes.
- tests/test_generate_sql.py — SQL quoting, INSERT/UPDATE shape,
multi-file emission, no-change idempotency.
Docs:
- docs_crypto_algorithms/automation_testing.md — manual setup
(secret, fork sync), local dry-run procedure, workflow_dispatch
testing against scanoss-qg/crypto-algorithms, and cleanup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedToo many files! This PR contains 296 files, which is 146 over the limit of 150. To get a review, narrow the scope: ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (4)
📒 Files selected for processing (296)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Refactor of this repo so it owns only keyword definitions per algorithm id, while metadata (name, OID, cryptoClass, key sizes, references) is sourced from spdx/cryptographic-algorithm-list as a Git submodule and joined by id. Adds the automation (scripts + 3 workflows) that keeps generated artifacts and downstream platform_deployment in sync.
What changed
Phase A — refactor (commit
4bd6099)external/spdx-crypto/(pinned commit, no fork).keywords/<id>.yamlwithgit mv(history preserved).CMAC.yamlnormalized tocmac.yamlto match the canonical SPDX id (per rule "ID conflicts → SPDX wins").keywords/<id>.yamlcreated for the 5 SPDX-only ids (X509,blakex,chacha,diffiehellman,shax).definitions_crypto_algorithms/algorithms/and the now-obsoletedefinitions/tree.spec_crypto_algorithms/with a slimspec/describing the keyword-only YAML format.dist/:crypto_algorithms_keywords.json—{id: [keywords]}for every algorithm.crypto_algorithms_linked.json—{id: {spdx, keywords}}for ids present in both sides.spdx_snapshot.json— baseline used by the SPDX-change detector.inconsistency_report.json— validator output.utilities/crypto_detect.pyto the new layout (loads keywords/ and metadata from the SPDX submodule).README.mdandutilities/README.md.Phase B — automation (commit
3144407)scripts/detect_spdx_changes.py— diff submodule vsdist/spdx_snapshot.json.scripts/generate_sql.py— INSERT/UPDATE SQL forcrypto_algorithms_metadata(conservative columns)..github/workflows/keywords-changed.yml— on keyword changes, regeneratedist/and open PR..github/workflows/spdx-changed.yml— daily cron +workflow_dispatchwithspdx_repo_urloverride; detects upstream changes, creates stubs for additions, regenerates dist, generates SQL, opens one PR per run..github/workflows/propagate-to-platform-deployment.yml— on push touchingdist/orsql/, opens PR inscanoss/platform_deployment(target is editable via env in the workflow).docs_crypto_algorithms/automation_testing.md— manual setup, local dry-run, workflow_dispatch testing against the fork, cleanup.Validator state on this branch
The 84 keyword-only ids are reported in
dist/inconsistency_report.jsonand remain incrypto_algorithms_keywords.jsonbut not incrypto_algorithms_linked.json.Required manual setup before workflows can run
PLATFORM_DEPLOYMENT_TOKEN— fine-grained PAT withContents: r/wandPull requests: r/wonscanoss/platform_deployment.gh repo sync scanoss-qg/crypto-algorithms --source spdx/cryptographic-algorithm-list. Used as synthetic upstream forworkflow_dispatchdry-runs.Test plan
git submodule update --init --recursiveafter checkout.pytest tests/— 17/17 passing locally.python3 scripts/validate_consistency.py— exit 0, 0 hard errors.python3 scripts/build_linked_dataset.py— no diff indist/.utilities/crypto_detect.pyon a sample file — metadata resolved from SPDX.PLATFORM_DEPLOYMENT_TOKENsecret.spdx-changed.ymlvia workflow_dispatch withspdx_repo_url=https://github.com/scanoss-qg/crypto-algorithms.gitafter injecting a synthetic change in that fork — verify the bot PR appears.main(after merge) — verifykeywords-changed.ymlopens a regen PR.propagate-to-platform-deployment.ymlopens a downstream PR.🤖 Generated with Claude Code