Skip to content

OCPSTRAT-1986: Enhancement: API-driven Azure topology and Swift private connectivity for HyperShift#1985

Merged
openshift-merge-bot[bot] merged 4 commits into
openshift:masterfrom
muraee:hypershift-azure-topology-api-driven
May 12, 2026
Merged

OCPSTRAT-1986: Enhancement: API-driven Azure topology and Swift private connectivity for HyperShift#1985
openshift-merge-bot[bot] merged 4 commits into
openshift:masterfrom
muraee:hypershift-azure-topology-api-driven

Conversation

@muraee
Copy link
Copy Markdown
Contributor

@muraee muraee commented Apr 27, 2026

Summary

  • Replace hardcoded IsAroHCP() / env-var checks with API-driven per-cluster decisions using Topology and Private fields on HostedCluster/HostedControlPlane
  • Extend AzurePrivateType enum with Swift variant and introduce AzureSwiftSpec to replace the SwiftPodNetworkInstanceAnnotation annotation
  • Enable ARO HCP clusters to express fully private intent (Topology=Private), causing shared ingress to drop all public endpoint exposure

Test plan

  • Unit tests for all topology/private-type combinations in visibility, router, infra, and shared ingress controllers
  • Integration tests for day-2 topology transitions (PublicAndPrivate ↔ Private)
  • E2E tests verifying private clusters have no public DNS/routes
  • Serialization compatibility tests for N-1/N+1 scenarios

🤖 Generated with Claude Code

@openshift-ci openshift-ci Bot requested review from csrwng and enxebre April 27, 2026 12:18
…yperShift

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@muraee muraee force-pushed the hypershift-azure-topology-api-driven branch from 00b0101 to f7111cd Compare April 27, 2026 12:21
@muraee muraee changed the title Enhancement: API-driven Azure topology and private connectivity for HyperShift Enhancement: API-driven Azure topology and Swift private connectivity for HyperShift Apr 27, 2026
@muraee muraee changed the title Enhancement: API-driven Azure topology and Swift private connectivity for HyperShift OCPSTRAT-1986: Enhancement: API-driven Azure topology and Swift private connectivity for HyperShift Apr 27, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 27, 2026

@muraee: This pull request references OCPSTRAT-1986 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the feature to target either version "5.0." or "openshift-5.0.", but it targets "openshift-4.22" instead.

Details

In response to this:

Summary

  • Replace hardcoded IsAroHCP() / env-var checks with API-driven per-cluster decisions using Topology and Private fields on HostedCluster/HostedControlPlane
  • Extend AzurePrivateType enum with Swift variant and introduce AzureSwiftSpec to replace the SwiftPodNetworkInstanceAnnotation annotation
  • Enable ARO HCP clusters to express fully private intent (Topology=Private), causing shared ingress to drop all public endpoint exposure

Test plan

  • Unit tests for all topology/private-type combinations in visibility, router, infra, and shared ingress controllers
  • Integration tests for day-2 topology transitions (PublicAndPrivate ↔ Private)
  • E2E tests verifying private clusters have no public DNS/routes
  • Serialization compatibility tests for N-1/N+1 scenarios

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 27, 2026
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
`omitzero`; old code ignores unknown fields during
deserialization. The `Type` value "Swift" is preserved as
a string.
- **N-1 round-trip risk:** Old code re-serializing an object
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't an API version be the way to mitigate this? Old client writing old API version doesn't have their intent squash fields they don't know about

Copy link
Copy Markdown
Contributor Author

@muraee muraee Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what you mean here? HyperShift currently has a single API version (v1beta1) with no conversion webhooks — are you suggesting this enhancement should introduce a new API version?

Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
- Clarify "private-only" scope (control-plane only, not openshift-ingress)
- Expand scope to include all per-cluster isAroHCP() call sites (CNO, CCM, storage, registry)
- Add exact CEL validation rules as kubebuilder markers on types
- Add AzurePlatformSpec-level CEL rule for topology→private.type requirement
- Change validation from operator-side to CEL admission
- Add PublicEndpointExposed status condition with full API spec
- Expand Day-2 transition section with HO→HCP propagation, reverse flow, and transition scope
- Add CPO rollout considerations including backport scenario
- Add Phase 2 verification query and Phase 3 CPO version prerequisite
- Fix annotation precedence inconsistency (API field takes precedence, annotation is fallback)
- Fix CRD schema ordering to consistently say "before" (not "before or simultaneously")
- Update graduation criteria for expanded scope

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
type AzureSwiftSpec struct {
// +required
// +kubebuilder:validation:MinLength=1
PodNetworkInstance string `json:"podNetworkInstance"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would changing this on a live cluster break Swift networking? Should it be immutable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing it doesn't permanently break the cluster — the router pod restarts and gets new IPs from the new pod network instance. If the new one is valid, connectivity resumes after the restart. It causes a temporary disruption but is recoverable, so we're keeping it mutable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's easier to make it immutable later, if we decide to.

Copy link
Copy Markdown
Member

@enxebre enxebre May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's easier to make it immutable later, if we decide to.

This would be a breaking change, while relaxing from immutable to mutable is allowed. Let's remove ambiguity and decide now what it needs to be.
@bennerv is there a known case for this to be mutable? let's make it immutable otherwise.

Please include CEL for this here, make sure to include prevention for unsetting at the parent level. We can also include the field docs already here for clarity

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — making it immutable now. Added a CEL rule on AzureSwiftSpec (self.podNetworkInstance == oldSelf.podNetworkInstance), field documentation explaining why it must not change, and a note that combined with Type immutability on AzurePrivateSpec, once Private.Type=Swift is set with a PodNetworkInstance, neither can be changed.

@bryan-cox
Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 5, 2026
version carrying the updated CPO. Until then, the N-1 CPO
continues to function via the legacy annotation fallback.
If clusters cannot be upgraded to the new minor stream in a
reasonable timeframe, a CPO backport to the previous minor
Copy link
Copy Markdown
Member

@enxebre enxebre May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please specify concrete ocp versions and timelines we need to support in ARO?

Also would it be possible we express intent in ARO-RP/Maestro for new HCs to be created without the annotation and no-op on existing so they remain on the old path until they upgrade and we don't care?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phase 1 is developed in 4.23 and backported to 4.22. Phase 3 cleanup proceeds once all clusters are running CPO 4.22+.

For the migration strategy: yes, for new HCs ARO-RP/Maestro sets the API fields from day one and skips the annotation. For existing HCs, ARO-RP/Maestro can set the API fields at any time — the old CPO ignores them and continues using the annotation. Once the cluster upgrades to 4.22+ the new CPO starts consuming the API fields. No active coordinated migration needed.

Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
@muraee muraee force-pushed the hypershift-azure-topology-api-driven branch from e264200 to 766513c Compare May 6, 2026 16:53
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label May 6, 2026
@muraee muraee force-pushed the hypershift-azure-topology-api-driven branch 2 times, most recently from bfa8ff7 to 4524029 Compare May 11, 2026 17:00
cluster's HO is upgraded first (new API types and helpers).
Phase 2 can set the API fields on all clusters at any time —
the old CPO ignores unknown fields and continues using the
annotation fallback. For new clusters, ARO-RP/Maestro sets
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new clusters can be created against an old version.
I suggest we tell ARO-RP to do a version check to only set the annotation if the HC version is < 4.22

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — new clusters can target old OCP versions. Updated: ARO-RP/Maestro should check the HC's target OCP version. For clusters targeting 4.22+, set only the API fields (no annotation). For clusters targeting < 4.22, continue setting the annotation for the old CPO.

`Private` or `PublicAndPrivate`, and forbidden otherwise:

```
!has(self.topology) || ((self.topology == 'Private' || self.topology == 'PublicAndPrivate') ? has(self.private) : !has(self.private))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this let you set self.private when self.topology is empty?

Copy link
Copy Markdown
Contributor Author

@muraee muraee May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — if topology is empty/unset, !has(self.topology) short-circuits to true and the rule imposes no constraint on private. So you could set private with an empty topology on creation. This is an existing rule though, not introduced by this enhancement.

@enxebre
Copy link
Copy Markdown
Member

enxebre commented May 12, 2026

can we fix the line length?

- Call out CEL immutability rule change as deliberate validation loosening
- Clarify UseSharedIngressHCP/HC naming distinct from isAroHCP() in main.go
- Add envtests for CEL rule validation to integration test plan

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@muraee muraee force-pushed the hypershift-azure-topology-api-driven branch from 4524029 to f22aec7 Compare May 12, 2026 09:32
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@muraee muraee force-pushed the hypershift-azure-topology-api-driven branch from 629713f to 95826c5 Compare May 12, 2026 10:35
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 12, 2026

@muraee: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@enxebre
Copy link
Copy Markdown
Member

enxebre commented May 12, 2026

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 12, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2026
@bryan-cox
Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 12, 2026
@openshift-merge-bot openshift-merge-bot Bot merged commit 4f67eee into openshift:master May 12, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants