-
Notifications
You must be signed in to change notification settings - Fork 709
release notes: add ai generator for improvements and bug fixes #22850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
qiancai
wants to merge
4
commits into
pingcap:master
Choose a base branch
from
qiancai:release-notes-ai-generator
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
4f2147a
support generating release notes by ai
qiancai 4583453
improve the filter logic and move the prompt to an independent file
qiancai 195da0b
update the naming rule of the release note file
qiancai 259870d
add the usage descriptions for the scripts
qiancai File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| # Release notes generator | ||
|
|
||
| `scripts/release_notes_generate_ai.py` generates English TiDB release notes according to PRs and issues in a specified excel file. | ||
|
|
||
| ## What it does | ||
|
|
||
| **Scope filtering:** | ||
|
|
||
| - Filters out PRs and issues that are not in the target release scope. For example, it filters out PRs that were merged before the previous patch release. | ||
| - Moves issues that already appeared in earlier notes from the same major.minor series to a separate worksheet. | ||
|
|
||
| **Duplicate handling:** | ||
|
|
||
| - Marks release notes that are already published in other series as `(dup)` and reuses the release notes for the same issue. | ||
|
|
||
| **Release note generation:** | ||
|
|
||
| - Generates English release notes using AI according to the release note draft provided in the PR, the PR description and code changes, and the issue description. | ||
| - Maps components in the workbook to the corresponding release note components. | ||
|
|
||
| **File output in Markdown:** | ||
|
|
||
| - Generates the release note file for the target release according to the release note template file. | ||
| - Add the improvements and bug fixes of each component to the corresponding sections of the release note file. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - Install Python dependencies: | ||
|
|
||
| ```bash | ||
| python3 -m pip install -r scripts/release_notes_ai/requirements.txt | ||
| ``` | ||
|
|
||
| - Prepare a GitHub token with access to the public repositories and set the GitHub token in the `GITHUB_TOKEN` environment variable: | ||
|
|
||
| ```bash | ||
| export GITHUB_TOKEN=<your-github-token> | ||
| ``` | ||
|
|
||
| - Install and log in to Codex CLI. The default `--ai-command` uses `codex exec`, so the installed Codex CLI must support `exec`, `--sandbox read-only`, `--ephemeral`, `--output-schema`, `--output-last-message`, and `-m <model>`. | ||
|
|
||
| ## Typical usage | ||
|
|
||
| ```bash | ||
| python3 scripts/release_notes_generate_ai.py \ | ||
| --version 8.5.7 \ | ||
| --excel /path/to/release-note-excel.xlsx \ | ||
| --releases-dir releases | ||
| ``` | ||
|
|
||
| ## Option descriptions | ||
|
|
||
| | Option | Required | Default value | Usage example | Description | | ||
| | --- | --- | --- | --- | --- | | ||
| | `--version <tidb-version>` | Yes | None | `--version 8.5.7` | Target TiDB version. This value is used for scope filtering, existing release-note lookup, generated Markdown front matter, and the default output file name. | | ||
| | `--excel <workbook-path>` | Yes | None | `--excel /path/to/release-note-excel.xlsx` | Path to the source release note excel file. The source workbook is not overwritten. The processed workbook is written to `<original-name>_processed.xlsx`. | | ||
| | `--releases-dir <releases-dir>` | Yes | None | `--releases-dir releases` | Path to the existing English release notes directory. The script scans this directory for historical release notes and writes the generated Markdown under this directory unless `--output-release-file` is specified. | | ||
| | `--sheet <sheet-name>` | No | `pr_for_release_note` | `--sheet pr_for_release_note` | Workbook sheet to process. | | ||
| | `--ai-command <command>` | No | `codex --ask-for-approval never exec --sandbox read-only --ephemeral` | `--ai-command "codex --ask-for-approval never exec --sandbox read-only --ephemeral"` | Command used to invoke the AI generator. The prompt is passed through standard input. When the command is `codex exec`, the script also passes `--output-schema` and `--output-last-message`. | | ||
| | `--ai-model <model>` | No | `gpt-5.4` | `--ai-model gpt-5.4` | Model name passed to `codex exec` with `-m`. | | ||
| | `--involve-ai-generation <ON-or-OFF>` | No | `ON` | `--involve-ai-generation OFF` | Whether to generate non-duplicate release notes with AI. Use `ON` to invoke AI, or `OFF` to use the source `formated_release_note` values. | | ||
| | `--output-release-file <markdown-file>` | No | Conditional | `--output-release-file /path/to/release-8.5.7.md` | Write the generated Markdown to a custom path. By default, the output under `--releases-dir` is `release-<version>-updated-by-ai.md` if `release-<version>.md` already exists, otherwise `release-<version>.md`. | | ||
| | `--ai-timeout <seconds>` | No | `600` | `--ai-timeout 600` | Timeout in seconds for each AI command invocation. | | ||
| | `--ai-workers <count>` | No | `3` | `--ai-workers 3` | Number of concurrent AI command invocations. | | ||
| | `--github-workers <count>` | No | `8` | `--github-workers 8` | Number of concurrent GitHub API prefetch workers. | | ||
| | `--author-workers <count>` | No | `3` | `--author-workers 3` | Number of concurrent workers used to resolve bot-authored cherry-pick PR authors. | | ||
| | `--checkpoint-interval <count>` | No | `1` | `--checkpoint-interval 1` | Save the processed workbook after every N completed AI rows. Use `0` to disable checkpoint saves. | | ||
| | `--force-regenerate` | No | Disabled | `--force-regenerate` | Clear existing AI-generated notes in the processed workbook and generate all non-duplicate rows again. | | ||
| | `--release-date <release-date>` | No | `TBD` | `--release-date "August 14, 2025"` | Release date text for the generated Markdown header. | | ||
| | `--skip-scope-preprocess` | No | Disabled | `--skip-scope-preprocess` | Skip moving not-in-scope PR rows to the `PRs_not_in_scope` sheet. | | ||
| | `--scope-base-branch-start-date <YYYY-MM-DD>` | No | Estimated from release history | `--scope-base-branch-start-date 2025-01-01` | Override the estimated release-m.n branch start date for x.y.0 scope preprocessing. The value must use the `YYYY-MM-DD` format. | | ||
|
|
||
| ## Generated files | ||
|
|
||
| - The source excel file passed to `--excel` is not overwritten. | ||
| - The processed excel file is written to `<original-name>_processed.xlsx` next to the source workbook. | ||
| - The generated Markdown file is written to `--output-release-file` when that option is specified. | ||
| - If `--output-release-file` is omitted and `release-<version>.md` already exists under `--releases-dir`, the generated Markdown file is written to `release-<version>-updated-by-ai.md`. | ||
| - If `--output-release-file` is omitted and `release-<version>.md` does not exist under `--releases-dir`, the generated Markdown file is written to `release-<version>.md`. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,297 @@ | ||
| from __future__ import annotations | ||
|
|
||
| import dataclasses | ||
| from functools import lru_cache | ||
| import json | ||
| import os | ||
| import shlex | ||
| import shutil | ||
| import subprocess | ||
| import tempfile | ||
| import textwrap | ||
| from pathlib import Path | ||
| from typing import Any | ||
|
|
||
| from .constants import ( | ||
| BUG_FIXES_REFERENCE, | ||
| GENERATION_PROMPT_TEMPLATE, | ||
| IMPROVEMENTS_REFERENCE, | ||
| ) | ||
| from .models import GeneratedNote, RowContext | ||
|
|
||
|
|
||
| class AIClient: | ||
| def __init__(self, command: str, model: str | None, timeout: int): | ||
| self.command = shlex.split(command) | ||
| self.model = model | ||
| self.timeout = timeout | ||
|
|
||
| def generate(self, prompt: str, expected_links: list[str], contributors: list[str]) -> GeneratedNote: | ||
| result, errors = self._run_and_validate(prompt, expected_links, contributors) | ||
| if result: | ||
| return result | ||
|
|
||
| repair_prompt = build_repair_prompt(prompt, errors) | ||
| result, repair_errors = self._run_and_validate(repair_prompt, expected_links, contributors) | ||
| if result: | ||
| return result | ||
| raise ValueError("; ".join(repair_errors)) | ||
|
|
||
| def _run_and_validate( | ||
| self, prompt: str, expected_links: list[str], contributors: list[str] | ||
| ) -> tuple[GeneratedNote | None, list[str]]: | ||
| output = self._run(prompt) | ||
| try: | ||
| data = extract_json_object(output) | ||
| except ValueError as exc: | ||
| return None, [str(exc)] | ||
| return validate_ai_response(data, expected_links, contributors) | ||
|
|
||
| def _run(self, prompt: str) -> str: | ||
| command = list(self.command) | ||
| if not command: | ||
| raise ValueError("AI command is empty. Pass a command with --ai-command.") | ||
| if not is_executable_available(command[0]): | ||
| raise FileNotFoundError( | ||
| f"AI command executable not found: {command[0]!r}. " | ||
| "Install it or pass a custom command with --ai-command." | ||
| ) | ||
|
|
||
| with tempfile.TemporaryDirectory() as temp_dir: | ||
| output_path: Path | None = None | ||
| if self._is_codex_exec(command): | ||
| if self.model: | ||
| command.extend(["-m", self.model]) | ||
| temp_path = Path(temp_dir) | ||
| schema_path = temp_path / "ai-output-schema.json" | ||
| output_path = temp_path / "ai-output.txt" | ||
| schema_path.write_text(json.dumps(ai_output_schema()), encoding="utf-8") | ||
| output_path.touch() | ||
| command.extend(["--output-schema", str(schema_path)]) | ||
| command.extend(["--output-last-message", str(output_path)]) | ||
|
|
||
| completed = subprocess.run( | ||
| command, | ||
| input=prompt, | ||
| text=True, | ||
| capture_output=True, | ||
| timeout=self.timeout, | ||
| check=False, | ||
| ) | ||
| if completed.returncode != 0: | ||
| raise RuntimeError( | ||
| "AI command failed with exit code " | ||
| f"{completed.returncode}: {summarize_process_output(completed)}" | ||
| ) | ||
| if output_path and output_path.exists(): | ||
| last_message = output_path.read_text(encoding="utf-8").strip() | ||
| if last_message: | ||
| return last_message | ||
| return completed.stdout.strip() | ||
|
|
||
| @staticmethod | ||
| def _is_codex_exec(command: list[str]) -> bool: | ||
| if not command: | ||
| return False | ||
| executable = Path(command[0]).name | ||
| return executable == "codex" and "exec" in command[1:] | ||
|
|
||
|
|
||
| def is_executable_available(executable: str) -> bool: | ||
| if os.sep in executable or (os.altsep and os.altsep in executable): | ||
| return Path(executable).exists() | ||
| return shutil.which(executable) is not None | ||
|
|
||
|
|
||
| def ai_output_schema() -> dict[str, Any]: | ||
| return { | ||
| "type": "object", | ||
| "additionalProperties": False, | ||
| "required": ["type", "release_note", "needs_review", "reason"], | ||
| "properties": { | ||
| "type": {"type": "string", "enum": ["improvement", "bug_fix"]}, | ||
| "release_note": {"type": "string"}, | ||
| "needs_review": {"type": "boolean"}, | ||
| "reason": {"type": "string"}, | ||
| }, | ||
| } | ||
|
|
||
|
|
||
| def summarize_process_output(completed: subprocess.CompletedProcess[str]) -> str: | ||
| parts = [] | ||
| if completed.stderr.strip(): | ||
| parts.append("stderr:\n" + tail_output(completed.stderr)) | ||
| if completed.stdout.strip(): | ||
| parts.append("stdout:\n" + tail_output(completed.stdout)) | ||
| return "\n\n".join(parts) or "no output" | ||
|
|
||
|
|
||
| def tail_output(text: str, max_lines: int = 40, max_chars: int = 4000) -> str: | ||
| tail = "\n".join(text.strip().splitlines()[-max_lines:]) | ||
| if len(tail) > max_chars: | ||
| tail = "...[truncated]\n" + tail[-max_chars:] | ||
| return tail | ||
|
|
||
|
|
||
| def build_generation_prompt( | ||
| row_context: RowContext, | ||
| expected_links: list[str], | ||
| contributors: list[str], | ||
| ) -> str: | ||
| prompt_template = load_prompt_template(GENERATION_PROMPT_TEMPLATE) | ||
| improvements_reference = load_reference_file(IMPROVEMENTS_REFERENCE) | ||
| bug_fixes_reference = load_reference_file(BUG_FIXES_REFERENCE) | ||
| context = { | ||
| "row_number": row_context.row_number, | ||
| "component": row_context.component, | ||
| "raw_component_from_excel": row_context.raw_component, | ||
| "issue_type_from_excel": row_context.issue_type, | ||
| "pr_title_from_excel": row_context.pr_title, | ||
| "formatted_release_note_from_excel": row_context.formatted_release_note, | ||
| "expected_links": expected_links, | ||
| "contributors": contributors, | ||
| "issues": [dataclasses.asdict(issue) for issue in row_context.issues], | ||
| "pull_requests": [dataclasses.asdict(pull) for pull in row_context.pulls], | ||
| } | ||
| return render_prompt_template( | ||
| prompt_template, | ||
| { | ||
| "EXPECTED_LINKS": json.dumps(expected_links, ensure_ascii=False, indent=2), | ||
| "CONTRIBUTORS": json.dumps(contributors, ensure_ascii=False, indent=2), | ||
| "ROW_CONTEXT": json.dumps(context, ensure_ascii=False, indent=2), | ||
| "IMPROVEMENTS_REFERENCE": improvements_reference, | ||
| "BUG_FIXES_REFERENCE": bug_fixes_reference, | ||
| }, | ||
| ) | ||
|
|
||
|
|
||
| def build_repair_prompt(original_prompt: str, errors: list[str]) -> str: | ||
| return textwrap.dedent( | ||
| f""" | ||
| Your previous answer did not satisfy the required JSON schema or release-note rules. | ||
|
|
||
| Validation errors: | ||
| {json.dumps(errors, ensure_ascii=False, indent=2)} | ||
|
|
||
| Rewrite the answer. Return only the corrected JSON object. | ||
|
|
||
| Original task: | ||
| {original_prompt} | ||
| """ | ||
| ).strip() | ||
|
|
||
|
|
||
| def render_prompt_template(template: str, values: dict[str, str]) -> str: | ||
| for key, value in values.items(): | ||
| template = template.replace(f"{{{{{key}}}}}", value) | ||
| return template.strip() | ||
|
|
||
|
|
||
| @lru_cache(maxsize=None) | ||
| def load_prompt_template(path: Path) -> str: | ||
| try: | ||
| return strip_prompt_template_heading(path.read_text(encoding="utf-8")) | ||
| except FileNotFoundError as exc: | ||
| raise FileNotFoundError( | ||
| f"Cannot find release-note prompt template: {path}. " | ||
| "Make sure scripts/release_notes_ai/prompts/generation.md exists." | ||
| ) from exc | ||
|
|
||
|
|
||
| def strip_prompt_template_heading(template: str) -> str: | ||
| lines = template.splitlines() | ||
| if lines and lines[0].startswith("# "): | ||
| lines = lines[1:] | ||
| if lines and not lines[0].strip(): | ||
| lines = lines[1:] | ||
| return "\n".join(lines) | ||
|
|
||
|
|
||
| @lru_cache(maxsize=None) | ||
| def load_reference_file(path: Path) -> str: | ||
| try: | ||
| return path.read_text(encoding="utf-8") | ||
| except FileNotFoundError as exc: | ||
| raise FileNotFoundError( | ||
| f"Cannot find release-note reference file: {path}. " | ||
| "Make sure the repo-local write-review-translate-release-notes skill is present." | ||
| ) from exc | ||
|
|
||
|
|
||
| def extract_json_object(output: str) -> dict[str, Any]: | ||
| output = output.strip() | ||
| if not output: | ||
| raise ValueError("AI command returned no output") | ||
| try: | ||
| data = json.loads(output) | ||
| except json.JSONDecodeError: | ||
| candidates = extract_json_object_candidates(output) | ||
| if not candidates: | ||
| raise ValueError("AI output did not contain a JSON object") from None | ||
| required_keys = {"type", "release_note", "needs_review", "reason"} | ||
| data = next( | ||
| (candidate for candidate in candidates if required_keys <= candidate.keys()), | ||
| candidates[0], | ||
| ) | ||
| if not isinstance(data, dict): | ||
| raise ValueError("AI output JSON is not an object") | ||
| return data | ||
|
|
||
|
|
||
| def extract_json_object_candidates(output: str) -> list[dict[str, Any]]: | ||
| decoder = json.JSONDecoder() | ||
| candidates: list[dict[str, Any]] = [] | ||
| for index, char in enumerate(output): | ||
| if char != "{": | ||
| continue | ||
| try: | ||
| data, _end = decoder.raw_decode(output[index:]) | ||
| except json.JSONDecodeError: | ||
| continue | ||
| if isinstance(data, dict): | ||
| candidates.append(data) | ||
| return candidates | ||
|
|
||
|
|
||
| def validate_ai_response( | ||
| data: dict[str, Any], | ||
| expected_links: list[str], | ||
| contributors: list[str], | ||
| ) -> tuple[GeneratedNote | None, list[str]]: | ||
| errors: list[str] = [] | ||
| note_type = data.get("type") | ||
| release_note = data.get("release_note") | ||
| needs_review = data.get("needs_review") | ||
| reason = data.get("reason") | ||
|
|
||
| if note_type not in {"improvement", "bug_fix"}: | ||
| errors.append('type must be "improvement" or "bug_fix"') | ||
| if not isinstance(release_note, str) or not release_note.startswith("- "): | ||
| errors.append('release_note must be a string that starts with "- "') | ||
| if isinstance(release_note, str) and release_note.rstrip().endswith("."): | ||
| errors.append("release_note must not end with a period") | ||
| if not isinstance(needs_review, bool): | ||
| errors.append("needs_review must be a boolean") | ||
| if not isinstance(reason, str): | ||
| errors.append("reason must be a string") | ||
|
|
||
| if isinstance(release_note, str): | ||
| for link in expected_links: | ||
| if link and link not in release_note: | ||
| errors.append(f"release_note is missing expected link: {link}") | ||
| for contributor in contributors: | ||
| expected = f"@[{contributor}](https://github.com/{contributor})" | ||
| if contributor and expected not in release_note: | ||
| errors.append(f"release_note is missing contributor: {contributor}") | ||
|
|
||
| if errors: | ||
| return None, errors | ||
| return ( | ||
| GeneratedNote( | ||
| note_type=str(note_type), | ||
| release_note=str(release_note).strip(), | ||
| needs_review=bool(needs_review), | ||
| reason=str(reason).strip(), | ||
| ), | ||
| [], | ||
| ) | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For large text outputs,
splitlines()can consume significant memory. Consider using a more memory-efficient approach if the output size is expected to be very large, although this is likely acceptable for release note generation.