peribolos: Add Repository Fork Management Support#583
peribolos: Add Repository Fork Management Support#583hoxhaeris wants to merge 3 commits intokubernetes-sigs:mainfrom
peribolos: Add Repository Fork Management Support#583Conversation
…gnment This refactoring separates GitHub Repos API fields into a dedicated RepoMetadata struct, following the same pattern as TeamMetadata/Team and Metadata/Config. Changes: - Create RepoMetadata struct with fields that map directly to GitHub's Update Repository API - Embed RepoMetadata in Repo struct - Keep Peribolos-specific fields (Previously, OnCreate) and fields managed by other APIs (Collaborators) in the Repo struct - Update all struct literals in main.go and tests to use the new embedded struct syntax This change is backwards compatible - the YAML/JSON serialization remains unchanged due to Go's struct embedding behavior.
✅ Deploy Preview for k8s-prow ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Hi @hoxhaeris. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cc @petr-muller |
peribolos: Add Repository Fork Management Support
|
/hold |
|
I have reviewed this PR and would like to recommend it for addition to the codebase. The changes aligns well with existing patterns, have adequate test coverage, and are protected by the |
|
hi, the PR seems well written indeed, just wondering why we need the flag |
Mostly to have more granular control (although most users would want them both, |
petr-muller
left a comment
There was a problem hiding this comment.
Just want to clarify some things, so I'm approving with a hold.
/hold
my-fork:
fork_from: kubernetes/kubernetes
default_branch_only: true
description: Our fork of **Kubernetes**
Is this snippet configuring a my-org/my-fork to exist, or my-org/kubernetes? IIUC it is the former, right?
| func boolPtr(b bool) *bool { | ||
| return &b | ||
| } | ||
|
|
||
| func strPtr(s string) *string { | ||
| return &s | ||
| } |
There was a problem hiding this comment.
Don't bother respinning the PR for this but there's generic ptr.To now: https://pkg.go.dev/k8s.io/utils/ptr#To (also useful in other codebases that import k8s utils ;) )
There was a problem hiding this comment.
Since I had to respin the PR for a few other fixes anyway, I went ahead and made this change as well. Done in 04ef97729
| // Note: GitHub may name the fork differently if there's a naming conflict | ||
| if createdName != repoName { | ||
| repoLogger.WithField("created_name", createdName).Warn("fork was created with a different name than expected") | ||
| } else { | ||
| repoLogger.Info("fork created successfully") | ||
| } |
There was a problem hiding this comment.
I'm not sure if I correctly understand how this is idempotent. We config a fork to exist in our subject org, after first run it is created under an arbitrary name. How do the subsequent runs avoid trying to recreate a fork if the existing one has an arbitrary name not present in the config?
There was a problem hiding this comment.
Ah, good point, this is a bug.
There was a problem hiding this comment.
The current code assumes the fork will be named exactly as specified in the config, and only looks up existing repos by that config name.
There was a problem hiding this comment.
I also need to implement a wait loop here. CreateForkInOrg returns a 202 Accepted immediately because fork creation is asynchronous on GitHub. Currently, we proceed to the next step without verifying the repository exists/is ready. This creates a race condition where subsequent configuration steps (like applying permissions) may fail with a 404 if the fork isn't fully ready yet.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hoxhaeris, petr-muller, Prucek The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I'm leaning towards keeping it, maybe make it inherit whatever |
|
/reopen dammit why are the buttons so close to each other |
|
@petr-muller: Reopened this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
I'm fine either way as well. The middle ground sounds good -- having Users who want repos managed but not forks can still do I'll update the implementation to follow this approach. |
|
New changes are detected. LGTM label has been removed. |
cc @petr-muller The config key is useful for YAML uniqueness - for example, if we already have a repo named kubernetes and want to fork kubernetes/kubernetes, we can't use duplicate keys in YAML. Current behavior (admittedly weird):
Planned fix: I'm leaning towards supporting non-restrictive metadata changes on forks (e.g., description, has_issues, has_wiki) since those are valid use cases. However, restrictive fields like private should be rejected or warned about, since GitHub doesn't allow making public forks private (although that might not be true for Enterprise, and that is an argument to actually allow all metadata modification, and let the tool fail if the modification is restricted, since trying to anticipate what GitHub will reject is fragile). I'll also need to fix the current issue where metadata is only applied when the config key matches the actual fork name - it should work regardless of what GitHub names the fork. Wanted to get your opinion: does supporting non-restrictive metadata on forks make sense, or should we keep forks minimal (fork_from, default_branch_only, collaborators only)? We can also add support for the name parameter in GitHub's fork API, which would let users enforce a specific fork name via the config key (I am also leaning towards this, more explicit, follows the current behaviour of normal repos).
|
|
/test pull-prow-unit-test |
Thanks for pointing this out, that's an interesting angle. If we can support non-restrictive metadata in some way then I'd prefer to do so. Seems like a valid use case to me but also not something that we need to get 100% right here in the initial PR, so I do not worry about that too much.
This is what my original "Is this snippet configuring a my-org/my-fork to exist, or my-org/kubernetes?" comment was concerned about. I think that if possible, the declarative peribolos config should be as explicit as possible, ideally specifying 'I want a fork of upstream/repo and I want it named I wonder if there's a more convenient config surface for this, like making The I have not yet reviewed the code increment, hope I get to it this week. |
| // configureForks creates repository forks from upstream repositories as specified in the config. | ||
| // This function only creates forks - it does not delete existing forks that are not in the config. | ||
| // Returns a mapping of config repo names to actual GitHub repo names (for forks that were renamed). | ||
| func configureForks(client forkClient, orgName string, orgConfig org.Config) (map[string]string, error) { |
There was a problem hiding this comment.
fork detection by upstream is expensive and potentially incorrect.
The function fetches every repo in the org and calls GetRepo on each to find which ones are forks and what their parent is. This is O(n) API calls where n = number of repos in the org. For large orgs (100s of repos) this will be very slow and may hit GitHub rate limits. A better approach: only GetRepo on repos whose names match config entries, or check IsForked/Fork field from the GetRepos list response
https://docs.github.com/en/rest/repos/repos?apiVersion=2022-11-28#list-organization-repositories
There was a problem hiding this comment.
Thanks for flagging this. I've reworked the detection logic so it no longer scans all org repos.
The current implementation uses GetRepos (single list call) to build a name lookup map, then for each configured fork it only checks at most 2 candidates: the config key name and the upstream repo name. The Fork field from the list response is checked before any GetRepo call, so non-fork repos never trigger an API call.
Total cost: 1 GetRepos + at most 2 GetRepo calls per configured fork, not O(n).
| deadline := time.Now().Add(timeout) | ||
| logger := logrus.WithFields(logrus.Fields{"org": org, "repo": repo}) | ||
|
|
||
| for time.Now().Before(deadline) { |
There was a problem hiding this comment.
unbounded retry on transient errors
The function returns an error only on timeout, but doesn't distinguish between "fork not ready yet" (404) and other transient API errors. If there's a persistent API error, it will silently retry for 5 minutes before failing. It should surface non-404 errors immediately.
There was a problem hiding this comment.
Good catch, thanks. waitForFork now distinguishes 404 from other errors.
Add the ability to manage repository forks declaratively via org config. Repositories with a `fork_from: owner/repo` field are forked into the org using the GitHub fork API (CreateForkInOrg). Key behaviors: - Fork detection checks candidate repo names (config key and upstream repo name) rather than scanning all org forks, keeping API calls at O(config entries) instead of O(org repos). - Forks created asynchronously by GitHub are polled until available, with immediate failure on non-404 errors. - A --fix-forks flag (inheriting from --fix-repos) gates fork creation. - Fork name mappings are passed to configureRepos and configureCollaborators so downstream operations use the actual GitHub repo name when it differs from the config key.
97fc84a to
eb8113a
Compare
|
/unhold |
Reject configs where multiple repo entries fork from the same upstream, since GitHub only allows one fork per upstream per org. Without this validation, both entries silently mapped to the same fork with non-deterministic metadata application.
Hey @petr-muller, apologies for the slow follow-up on this. And thank you for the feedback. On the The current design (forks under repos:
kubernetes:
fork_from: "kubernetes/kubernetes"the repo name is A separate
Also worth noting: we now support overriding metadata on forks (description, has_issues, private, etc.). After the fork is created, That said, I see the appeal of the upstream-as-key approach for discoverability, it puts the "what are we forking" front and center. Happy to discuss further or rework if you feel strongly about it. |
Summary
This PR adds support for managing GitHub repository forks in Peribolos. It allows users to declaratively configure forks alongside standard repositories, handling creation, metadata configuration, and edge cases like naming conflicts.
Key Features
fork_from.nameAPI parameter.Configuration Examples
Basic Fork
Create a fork using the config key as the fork name.
Customized Fork (Hard Fork / Mirror)
Create a fork with a custom name, description, and specific settings (e.g., disabling issues).
Usage
Fork management is enabled automatically when you fix repositories.
Behavior
fork_fromformatMetadata Inheritance
Forks inherit metadata from the upstream repository:
Implementation Details
nameparameter, enforcing the exact fork name. If a repository with this name already exists, the operation fails with an error.configureForkspolls GitHub every 10 seconds until the fork is fully created (up to 5 minutes) before proceeding. Only retries on 404 (fork not ready); other errors fail immediately.RepoMetadatastruct) to align data structures with the GitHub API.API Reference