Skip to content

chore: Add version bump automation script#2519

Merged
BrendanWalsh merged 5 commits intomasterfrom
smamindl/add-version-bump-script
Apr 1, 2026
Merged

chore: Add version bump automation script#2519
BrendanWalsh merged 5 commits intomasterfrom
smamindl/add-version-bump-script

Conversation

@smamindl
Copy link
Copy Markdown
Collaborator

@smamindl smamindl commented Mar 24, 2026

Summary

Adds scripts/bump-version.py — a release automation tool that safely updates version strings across the repo during version bumps, and scripts/test_bump_version.py — a 175-test suite covering unit, integration, fuzz, adversarial, round-trip, snapshot, and historical replay tests.

Problem

Version bumps currently require manual find-and-replace across the repo, which is error-prone:

  • Risk of corrupting similar versions (e.g., 1.0.115 when bumping 1.0.11)
  • Risk of modifying versioned_docs/ (historical snapshots)
  • Risk of changing unrelated versions in package.json (npm deps)
  • Risk of changing a dependency that happens to share the same version number

Solution

A context-anchored substitution system that replaces version strings only when they appear within a known SynapseML-specific context. A bare version like 1.1.0 is never blindly replaced — every replacement must be anchored by surrounding text that proves it refers to SynapseML (e.g., synapseml_2.12:1.1.0, SYNAPSEML_VERSION=1.1.0).

If a version reference has no SynapseML anchor, the script refuses to run rather than guessing.

Usage

# Preview what would change (auto-detects current version):
python scripts/bump-version.py --to 1.1.3 --dry-run

# Apply bump:
python scripts/bump-version.py --to 1.1.3

Features

  • 20 context patterns (14 self-anchored, 4 line-anchored, 2 file-anchored) ensure only SynapseML versions are replaced
  • Auto-detects current version from docusaurus.config.js
  • Enforces X.Y.Z format and version must increase
  • Two-pass architecture: read all → validate → write all (prevents partial-bump state)
  • Post-condition self-verification: re-reads files after write, confirms zero old version remaining
  • Broad sweep: warns about old version in unscanned/denylisted files (for CI logs)
  • Expected-files manifest: validates all 16 historically-bumped files are hit
  • Docusaurus integration: runs docs:version when website/docs/ exists
  • Unanchored refs → hard fail: any version reference without SynapseML context causes exit code 1

Test Suite (175 tests)

Category Tests Coverage
Bare regex 18 Boundary matching, CRLF, edge positions
Template regex 20 All 14 self-anchored patterns
Context anchoring 28 18 must-anchor + 6 must-NOT-anchor
apply() 5 Offset correctness, CRLF preservation
File filtering 40 All denylists, extensions, extensionless files
Integration (E2E) 7 CLI dry-run/live, exit codes, v-prefix
Fuzz (hypothesis) 8 ~1300 random inputs, collisions, idempotency
Adversarial 15 Spark version collision, unicode, 10KB lines
Round-trip 4 Bump A→B→A = byte-identical
Snapshot regression 4 Live repo coverage floor
Historical replay 10 All 10 bumps from v1.0.9→v1.1.2 replayed against git history
Post-condition 2 Self-verification fires on live, skips on dry-run

Copilot AI review requested due to automatic review settings March 24, 2026 01:16
@github-actions
Copy link
Copy Markdown

Hey @smamindl 👋!
Thank you so much for contributing to our repository 🙌.
Someone from SynapseML Team will be reviewing this pull request soon.

We use semantic commit messages to streamline the release process.
Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix.
This helps us to create release messages and credit you for your hard work!

Examples of commit messages with semantic prefixes:

  • fix: Fix LightGBM crashes with empty partitions
  • feat: Make HTTP on Spark back-offs configurable
  • docs: Update Spark Serving usage
  • build: Add codecov support
  • perf: improve LightGBM memory usage
  • refactor: make python code generation rely on classes
  • style: Remove nulls from CNTKModel
  • test: Add test coverage for CNTKModel

To test your commit locally, please follow our guild on building from source.
Check out the developer guide for additional guidance on testing your change.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 24, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA dc8e98c.
Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a repo-wide version bump automation script (scripts/bump-version.py) intended to safely update SynapseML version strings across the repository while avoiding partial matches and protected paths (e.g., versioned_docs/, package.json).

Changes:

  • Introduces a Python CLI tool to search/replace exact version strings using a boundary-aware regex.
  • Adds denylist-based directory/file skipping, plus dry-run and verbose reporting modes.
  • Implements pre-run “regex safety checks” to validate matching behavior against edge cases.

Comment thread scripts/bump-version.py Outdated
Comment thread scripts/bump-version.py Outdated
Comment on lines +116 to +147
try:
content = file_path.read_text(encoding="utf-8")
except (UnicodeDecodeError, PermissionError):
return 0, []

matches = list(regex.finditer(content))
if not matches:
return 0, []

changes = []
lines = content.split("\n")
seen_lines = set()
for match in matches:
line_num = content[:match.start()].count("\n") + 1
if line_num not in seen_lines:
seen_lines.add(line_num)
line_text = lines[line_num - 1].strip()
# Truncate long lines, showing context around the version
if len(line_text) > 120:
idx = line_text.find(old_version)
if idx >= 0:
start = max(0, idx - 40)
end = min(len(line_text), idx + len(old_version) + 40)
line_text = "..." + line_text[start:end] + "..."
changes.append(f" L{line_num}: {line_text}")

count = len(matches)

if not dry_run:
new_content = regex.sub(new_version, content)
file_path.write_text(new_content, encoding="utf-8")

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path.read_text() / write_text() in text mode will normalize line endings (e.g., CRLF -> LF) when rewriting files, which can create large noisy diffs unrelated to the version bump. Consider preserving original newlines (e.g., opening with newline='' and writing back with the same newline style, or operating on bytes while keeping the original line endings) so the script only changes the version string.

Copilot uses AI. Check for mistakes.
Comment thread scripts/bump-version.py Outdated
@BrendanWalsh BrendanWalsh force-pushed the smamindl/add-version-bump-script branch from 82f5f3d to 6e7a37b Compare April 1, 2026 05:21
smamindl and others added 5 commits April 1, 2026 05:21
Adds scripts/bump-version.py for safely updating version strings across
the repo during releases. Uses word-boundary-aware regex to prevent
partial matches (e.g., won't corrupt 1.0.115 when bumping 1.0.11).

Features:
- Dry-run mode to preview changes
- Denylist for versioned_docs/, package.json, etc.
- Regex safety verification before applying
- Summary report of all changes

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Use read_bytes/write_bytes instead of read_text/write_text to avoid
CRLF -> LF normalization that would create noisy diffs unrelated to
the version bump.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Redesigned version bump from bare regex to context-anchored substitution.
Every replacement is proven SynapseML-related by surrounding text —
safe for fully unattended automated releases.

Features:
- 20 context patterns (self/line/file-anchored)
- Auto-detects current version from docusaurus.config.js
- Enforces X.Y.Z format, version must increase
- Two-pass: read all → validate → write all
- Post-condition self-verification (re-reads files after write)
- Broad sweep warns about old version in unscanned files
- Runs docusaurus docs:version when website/docs/ exists
- 175 tests: unit, integration, fuzz, adversarial, round-trip,
  snapshot regression, and 10 historical bump replays

Usage: python scripts/bump-version.py --to 1.1.3 [--dry-run]

Co-authored-by: smamindl <106691906+smamindl@users.noreply.github.com>
@BrendanWalsh BrendanWalsh force-pushed the smamindl/add-version-bump-script branch from 6e7a37b to dc8e98c Compare April 1, 2026 05:21
@BrendanWalsh BrendanWalsh merged commit 832a4f6 into master Apr 1, 2026
10 checks passed
@BrendanWalsh BrendanWalsh deleted the smamindl/add-version-bump-script branch April 1, 2026 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants