Skip to content

peribolos: Add Repository Fork Management Support#583

Open
hoxhaeris wants to merge 3 commits intokubernetes-sigs:mainfrom
hoxhaeris:feature/peribolos-fork-management
Open

peribolos: Add Repository Fork Management Support#583
hoxhaeris wants to merge 3 commits intokubernetes-sigs:mainfrom
hoxhaeris:feature/peribolos-fork-management

Conversation

@hoxhaeris
Copy link
Copy Markdown
Contributor

@hoxhaeris hoxhaeris commented Jan 5, 2026

Summary

This PR adds support for managing GitHub repository forks in Peribolos. It allows users to declaratively configure forks alongside standard repositories, handling creation, metadata configuration, and edge cases like naming conflicts.

Key Features

  • Declarative Fork Management: Define forks simply by specifying fork_from.
  • Full Metadata Support: Configure description, homepage, privacy, and other settings for forks just like regular repos.
  • Fork Name Enforcement: The config key becomes the fork name via GitHub's name API parameter.
  • Race Condition Handling: Intelligently waits for asynchronous fork creation on GitHub to complete before attempting to apply settings.

Configuration Examples

Basic Fork

Create a fork using the config key as the fork name.

orgs:
  my-org:
    repos:
      # Will create my-org/kubernetes as a fork of kubernetes/kubernetes
      kubernetes:
        fork_from: kubernetes/kubernetes

Customized Fork (Hard Fork / Mirror)

Create a fork with a custom name, description, and specific settings (e.g., disabling issues).

orgs:
  my-org:
    repos:
      # Create 'internal-k8s' as a fork of 'kubernetes/kubernetes'
      internal-k8s:
        fork_from: kubernetes/kubernetes
        default_branch_only: true

        # Metadata is applied immediately after creation
        description: "Internal mirror of Kubernetes - DO NOT OPEN PRS HERE"
        homepage: "https://internal.docs/k8s"

        # Configure repo settings
        has_issues: false  # Disable issues to force users upstream
        has_projects: false
        has_wiki: false
        allow_merge_commit: true
        allow_squash_merge: false

        # Configure access
        collaborators:
          ci-bot: admin

Usage

Fork management is enabled automatically when you fix repositories.

# Standard run - manages org, teams, and repos (including forks)
peribolos --config-path org.yaml --fix-org --fix-teams --fix-repos

# Target only forks specifically (if needed)
peribolos --config-path org.yaml --fix-forks

# Explicitly exclude forks
peribolos --config-path org.yaml --fix-org --fix-teams --fix-repos --fix-forks=false

Behavior

Scenario Behavior
Repo doesn't exist Creates fork from upstream, waits for readiness, then applies metadata
Repo exists as correct fork Updates metadata (idempotent)
Repo exists but not a fork Error
Repo exists as fork of different upstream Error (safe failure)
Multiple config entries fork from the same upstream Error (GitHub only allows one fork per upstream per org)
Invalid fork_from format Error
Upstream repo doesn't exist Error
Some forks fail, others succeed Logs errors for failed forks, continues with successful ones

Note on "repo already exists but is not a fork": This can only happen if someone creates a repository manually on GitHub with the same name as a fork defined in your config. It cannot occur if all repositories are managed through Peribolos, since YAML does not allow duplicate keys. To resolve, either rename/delete the manually-created repository, or choose a different name for the fork in your config.

Metadata Inheritance

Forks inherit metadata from the upstream repository:

  • Fields explicitly set in config override the inherited/current value
  • Fields omitted (nil) retain the fork's current value
  • Metadata restrictions vary by GitHub edition; restricted changes result in API errors

Implementation Details

  • Ordering: Fork configuration runs before repository configuration to ensure forks exist before metadata is applied.
  • Fork Naming: The config key is passed to GitHub's fork API name parameter, enforcing the exact fork name. If a repository with this name already exists, the operation fails with an error.
  • Partial Failure Resilience: If some forks fail (e.g., upstream not found), successfully created forks still proceed through metadata and collaborator configuration.
  • Race Condition Handling: configureForks polls GitHub every 10 seconds until the fork is fully created (up to 5 minutes) before proceeding. Only retries on 404 (fork not ready); other errors fail immediately.
  • Refactoring: Includes internal refactoring (RepoMetadata struct) to align data structures with the GitHub API.

API Reference

…gnment

This refactoring separates GitHub Repos API fields into a dedicated
RepoMetadata struct, following the same pattern as TeamMetadata/Team
and Metadata/Config.

Changes:
- Create RepoMetadata struct with fields that map directly to GitHub's
  Update Repository API
- Embed RepoMetadata in Repo struct
- Keep Peribolos-specific fields (Previously, OnCreate) and fields
  managed by other APIs (Collaborators) in the Repo struct
- Update all struct literals in main.go and tests to use the new
  embedded struct syntax

This change is backwards compatible - the YAML/JSON serialization
remains unchanged due to Go's struct embedding behavior.
@netlify
Copy link
Copy Markdown

netlify Bot commented Jan 5, 2026

Deploy Preview for k8s-prow ready!

Name Link
🔨 Latest commit 298aa76
🔍 Latest deploy log https://app.netlify.com/projects/k8s-prow/deploys/69e5370136a1770008c97e37
😎 Deploy Preview https://deploy-preview-583--k8s-prow.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the area/peribolos Issues or PRs related to prow's peribolos component label Jan 5, 2026
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 5, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @hoxhaeris. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 5, 2026
@hoxhaeris
Copy link
Copy Markdown
Contributor Author

/cc @petr-muller

@hoxhaeris hoxhaeris changed the title [peribolos] Add Repository Fork Management Support peribolos: Add Repository Fork Management Support Jan 5, 2026
Copy link
Copy Markdown
Member

@Prucek Prucek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 9, 2026
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 9, 2026
@hoxhaeris
Copy link
Copy Markdown
Contributor Author

/hold
I want to watch the rollout.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 9, 2026
@bradmwilliams
Copy link
Copy Markdown
Contributor

I have reviewed this PR and would like to recommend it for addition to the codebase. The changes aligns well with existing patterns, have adequate test coverage, and are protected by the --fix-forks flag.
Thanks for your consideration!

@matthyx
Copy link
Copy Markdown
Contributor

matthyx commented Jan 14, 2026

hi, the PR seems well written indeed, just wondering why we need the flag --fix-forks

@hoxhaeris
Copy link
Copy Markdown
Contributor Author

hi, the PR seems well written indeed, just wondering why we need the flag --fix-forks

Mostly to have more granular control (although most users would want them both, --fix-forks and --fix-repos). A fork is still a repo, and removing the flag would offer a simpler design. Also, since this is a new feature, gating it made sense at that time, but thinking about it more carefully, I believe we can drop the flag.

Copy link
Copy Markdown
Contributor

@petr-muller petr-muller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to clarify some things, so I'm approving with a hold.
/hold

    my-fork:
      fork_from: kubernetes/kubernetes
      default_branch_only: true
      description: Our fork of **Kubernetes**

Is this snippet configuring a my-org/my-fork to exist, or my-org/kubernetes? IIUC it is the former, right?

Comment thread cmd/peribolos/main_test.go Outdated
Comment on lines +4321 to +4327
func boolPtr(b bool) *bool {
return &b
}

func strPtr(s string) *string {
return &s
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't bother respinning the PR for this but there's generic ptr.To now: https://pkg.go.dev/k8s.io/utils/ptr#To (also useful in other codebases that import k8s utils ;) )

Copy link
Copy Markdown
Contributor Author

@hoxhaeris hoxhaeris Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I had to respin the PR for a few other fixes anyway, I went ahead and made this change as well. Done in 04ef97729

Comment thread cmd/peribolos/main.go Outdated
Comment on lines +1253 to +1258
// Note: GitHub may name the fork differently if there's a naming conflict
if createdName != repoName {
repoLogger.WithField("created_name", createdName).Warn("fork was created with a different name than expected")
} else {
repoLogger.Info("fork created successfully")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I correctly understand how this is idempotent. We config a fork to exist in our subject org, after first run it is created under an arbitrary name. How do the subsequent runs avoid trying to recreate a fork if the existing one has an arbitrary name not present in the config?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point, this is a bug.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code assumes the fork will be named exactly as specified in the config, and only looks up existing repos by that config name.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also need to implement a wait loop here. CreateForkInOrg returns a 202 Accepted immediately because fork creation is asynchronous on GitHub. Currently, we proceed to the next step without verifying the repository exists/is ready. This creates a race condition where subsequent configuration steps (like applying permissions) may fail with a 404 if the fork isn't fully ready yet.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hoxhaeris, petr-muller, Prucek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 14, 2026
@petr-muller
Copy link
Copy Markdown
Contributor

Also, since this is a new feature, gating it made sense at that time, but thinking about it more carefully, I believe we can drop the flag.

I'm leaning towards keeping it, maybe make it inherit whatever --fix-repos is unless --fix-forks is explicitly passed? It's a pattern in peribolos and I think new larger reconciled GH entities with sufficiently custom logic benefit from a flag. But I do not really have strong opinion on this, either case is fine.

@petr-muller
Copy link
Copy Markdown
Contributor

/reopen

dammit why are the buttons so close to each other

@k8s-ci-robot k8s-ci-robot reopened this Jan 14, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@petr-muller: Reopened this PR.

Details

In response to this:

/reopen

dammit why are the buttons so close to each other

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hoxhaeris
Copy link
Copy Markdown
Contributor Author

I'm leaning towards keeping it, maybe make it inherit whatever --fix-repos is unless --fix-forks is explicitly passed? It's a pattern in peribolos and I think new larger reconciled GH entities with sufficiently custom logic benefit from a flag. But I do not really have strong opinion on this, either case is fine.

I'm fine either way as well. The middle ground sounds good -- having --fix-forks inherit from --fix-repos unless explicitly passed. This keeps the pattern consistent while providing sensible defaults.

Users who want repos managed but not forks can still do --fix-repos --fix-forks=false.

I'll update the implementation to follow this approach.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 15, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 15, 2026
@hoxhaeris
Copy link
Copy Markdown
Contributor Author

hoxhaeris commented Jan 15, 2026

Just want to clarify some things, so I'm approving with a hold. /hold

    my-fork:
      fork_from: kubernetes/kubernetes
      default_branch_only: true
      description: Our fork of **Kubernetes**

Is this snippet configuring a my-org/my-fork to exist, or my-org/kubernetes? IIUC it is the former, right?

cc @petr-muller
The config key (my-fork) is a YAML identifier, but GitHub determines the actual fork name. By default, GitHub names forks after the upstream repo, so this would create my-org/kubernetes (if a repo with the same name does not exist on my-org, in that case it would likely be kubernetes-1), not my-org/my-fork.

The config key is useful for YAML uniqueness - for example, if we already have a repo named kubernetes and want to fork kubernetes/kubernetes, we can't use duplicate keys in YAML.

Current behavior (admittedly weird):

  • Fork is created with whatever name GitHub assigns (typically the upstream repo name)
  • Collaborators are applied correctly (we track the actual name internally)
  • Metadata fields (description, private, etc.) are only applied if the config key happens to match the actual fork name - otherwise they're silently ignored
  • Even when metadata is applied, some fields will fail due to GitHub restrictions (e.g., "public forks can't be made private")

Planned fix: I'm leaning towards supporting non-restrictive metadata changes on forks (e.g., description, has_issues, has_wiki) since those are valid use cases. However, restrictive fields like private should be rejected or warned about, since GitHub doesn't allow making public forks private (although that might not be true for Enterprise, and that is an argument to actually allow all metadata modification, and let the tool fail if the modification is restricted, since trying to anticipate what GitHub will reject is fragile).

I'll also need to fix the current issue where metadata is only applied when the config key matches the actual fork name - it should work regardless of what GitHub names the fork.

Wanted to get your opinion: does supporting non-restrictive metadata on forks make sense, or should we keep forks minimal (fork_from, default_branch_only, collaborators only)?

We can also add support for the name parameter in GitHub's fork API, which would let users enforce a specific fork name via the config key (I am also leaning towards this, more explicit, follows the current behaviour of normal repos).

(Note: I’ve already implemented what I mentioned above, but I kept the commits split in case we need to revert any.)

@hoxhaeris hoxhaeris requested a review from petr-muller January 15, 2026 19:04
@hoxhaeris
Copy link
Copy Markdown
Contributor Author

/test pull-prow-unit-test
/test pull-prow-unit-test-race-detector-nonblocking

@petr-muller
Copy link
Copy Markdown
Contributor

Wanted to get your opinion: does supporting non-restrictive metadata on forks make sense, or should we keep forks minimal (fork_from, default_branch_only, collaborators only)?

Thanks for pointing this out, that's an interesting angle. If we can support non-restrictive metadata in some way then I'd prefer to do so. Seems like a valid use case to me but also not something that we need to get 100% right here in the initial PR, so I do not worry about that too much.

We can also add support for the name parameter in GitHub's fork API, which would let users enforce a specific fork name via the config key (I am also leaning towards this, more explicit, follows the current behaviour of normal repos).

This is what my original "Is this snippet configuring a my-org/my-fork to exist, or my-org/kubernetes?" comment was concerned about. I think that if possible, the declarative peribolos config should be as explicit as possible, ideally specifying 'I want a fork of upstream/repo and I want it named repo'. The state where we'd only specify "I want a fork" and the fork would be named in an arbitrary, GitHub-side selected way is IMO worse than requiring the config author to make a decision (or fail if they rely on the default which cannot be fulfilled)

I wonder if there's a more convenient config surface for this, like making fork_from a key:

forks:
  kubernetes/kubernetes:
    default_branch_only: true
    description: Our fork of **Kubernetes**
    optional_custom_name: k8s

The optional_custom_name would be optional, but if it was not used, we'd assume the same (kubernetes) name client-side and therefore always use the specific name in the API call (basically never rely on GH-side name selection).

I have not yet reviewed the code increment, hope I get to it this week.

Comment thread cmd/peribolos/main.go
// configureForks creates repository forks from upstream repositories as specified in the config.
// This function only creates forks - it does not delete existing forks that are not in the config.
// Returns a mapping of config repo names to actual GitHub repo names (for forks that were renamed).
func configureForks(client forkClient, orgName string, orgConfig org.Config) (map[string]string, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fork detection by upstream is expensive and potentially incorrect.

The function fetches every repo in the org and calls GetRepo on each to find which ones are forks and what their parent is. This is O(n) API calls where n = number of repos in the org. For large orgs (100s of repos) this will be very slow and may hit GitHub rate limits. A better approach: only GetRepo on repos whose names match config entries, or check IsForked/Fork field from the GetRepos list response

https://docs.github.com/en/rest/repos/repos?apiVersion=2022-11-28#list-organization-repositories

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging this. I've reworked the detection logic so it no longer scans all org repos.

The current implementation uses GetRepos (single list call) to build a name lookup map, then for each configured fork it only checks at most 2 candidates: the config key name and the upstream repo name. The Fork field from the list response is checked before any GetRepo call, so non-fork repos never trigger an API call.

Total cost: 1 GetRepos + at most 2 GetRepo calls per configured fork, not O(n).

Comment thread cmd/peribolos/main.go
deadline := time.Now().Add(timeout)
logger := logrus.WithFields(logrus.Fields{"org": org, "repo": repo})

for time.Now().Before(deadline) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unbounded retry on transient errors

The function returns an error only on timeout, but doesn't distinguish between "fork not ready yet" (404) and other transient API errors. If there's a persistent API error, it will silently retry for 5 minutes before failing. It should surface non-404 errors immediately.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks. waitForFork now distinguishes 404 from other errors.

Add the ability to manage repository forks declaratively via org config.
Repositories with a `fork_from: owner/repo` field are forked into the
org using the GitHub fork API (CreateForkInOrg).

Key behaviors:
- Fork detection checks candidate repo names (config key and upstream
  repo name) rather than scanning all org forks, keeping API calls at
  O(config entries) instead of O(org repos).
- Forks created asynchronously by GitHub are polled until available,
  with immediate failure on non-404 errors.
- A --fix-forks flag (inheriting from --fix-repos) gates fork creation.
- Fork name mappings are passed to configureRepos and
  configureCollaborators so downstream operations use the actual
  GitHub repo name when it differs from the config key.
@hoxhaeris hoxhaeris force-pushed the feature/peribolos-fork-management branch from 97fc84a to eb8113a Compare April 19, 2026 19:22
@hoxhaeris
Copy link
Copy Markdown
Contributor Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 19, 2026
Reject configs where multiple repo entries fork from the same upstream,
since GitHub only allows one fork per upstream per org. Without this
validation, both entries silently mapped to the same fork with
non-deterministic metadata application.
@hoxhaeris
Copy link
Copy Markdown
Contributor Author

Wanted to get your opinion: does supporting non-restrictive metadata on forks make sense, or should we keep forks minimal (fork_from, default_branch_only, collaborators only)?

Thanks for pointing this out, that's an interesting angle. If we can support non-restrictive metadata in some way then I'd prefer to do so. Seems like a valid use case to me but also not something that we need to get 100% right here in the initial PR, so I do not worry about that too much.

We can also add support for the name parameter in GitHub's fork API, which would let users enforce a specific fork name via the config key (I am also leaning towards this, more explicit, follows the current behaviour of normal repos).

This is what my original "Is this snippet configuring a my-org/my-fork to exist, or my-org/kubernetes?" comment was concerned about. I think that if possible, the declarative peribolos config should be as explicit as possible, ideally specifying 'I want a fork of upstream/repo and I want it named repo'. The state where we'd only specify "I want a fork" and the fork would be named in an arbitrary, GitHub-side selected way is IMO worse than requiring the config author to make a decision (or fail if they rely on the default which cannot be fulfilled)

I wonder if there's a more convenient config surface for this, like making fork_from a key:

forks:
  kubernetes/kubernetes:
    default_branch_only: true
    description: Our fork of **Kubernetes**
    optional_custom_name: k8s

The optional_custom_name would be optional, but if it was not used, we'd assume the same (kubernetes) name client-side and therefore always use the specific name in the API call (basically never rely on GH-side name selection).

I have not yet reviewed the code increment, hope I get to it this week.

Hey @petr-muller, apologies for the slow follow-up on this. And thank you for the feedback.

On the fork_from-as-key idea: I don't have a strong preference and I can go either way. That said, here's what I've been thinking about the trade-offs:

The current design (forks under repos:) has the config key serve as the explicit fork name. When you write:

repos:
  kubernetes:
    fork_from: "kubernetes/kubernetes"

the repo name is kubernetes, that's what gets passed to GitHub's fork API name parameter. There's nothing implicit or GitHub-selected about the naming now.

A separate forks: section with upstream-as-key is nice for readability, but it introduces some friction:

  • Team.Repos references repos by name. A separate forks: section means team configs need to cross-reference fork names, which adds indirection.
  • Fork repos need the same metadata and collaborator config as regular repos. Keeping them under repos: means all of that works without duplication.
  • If optional_custom_name is omitted, the name is derived from the upstream key, which is actually more implicit than having the user type the desired name as the config key.

Also worth noting: we now support overriding metadata on forks (description, has_issues, private, etc.). After the fork is created, configureRepos picks it up, computes a delta against its current state, and applies any metadata fields that are explicitly set in the config. Fields left unset (nil) keep their current value (inherited from upstream). If a change isn't supported by GitHub (e.g., making a public fork private on github.com), the API will return an error and peribolos will report it, so invalid config fails explicitly rather than silently.

That said, I see the appeal of the upstream-as-key approach for discoverability, it puts the "what are we forking" front and center. Happy to discuss further or rework if you feel strongly about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/peribolos Issues or PRs related to prow's peribolos component cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants