WIP: OCPSTRAT-3036: Rebase 1.36#2653
Conversation
…pool-status KEP-5677: Add ResourcePoolStatusRequest API for DRA resource availability visibility
… in validation_resourceslice_test.go
KEP-5491: DRA: List Types for Attributes [Alpha]
The fast-delete pod status tests currently require the intentionally failing "fail" container to report exit code 1. In CI, some runtimes occasionally report exit code 2 with reason=Error even though the tested invariant still holds: the container failed and the blocked workload container never started. The latest dims/test-k8s failure on master showed exactly that state: the pod remained Failed, Initialized=False, the blocked container reported started=false, and only the failing init container drifted from exit 1 to exit 2. This matches kubernetes/kubernetes issue 135713 and the related pending-container history in PR 131605. Accept exit code 2 in this verifier so the test continues to assert the behavior it is meant to cover instead of a lower-layer exit-code detail. Fixes issue 135713 Tested: - hack/verify-gofmt.sh - hack/verify-test-code.sh - hack/verify-typecheck.sh ./test/e2e/node/... - go test ./test/e2e/node -run TestNonExistent -count=1 Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
Replace plain bool with sync/atomic.Bool for the useStreaming field in remoteRuntimeService and remoteImageService to eliminate a data race when multiple goroutines concurrently read/write the field during Unimplemented fallback. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
…logquery-lock-defualt [FeatureGate] Promote NodeLogQuery to GA in v1.36 and lock default to `true`
…-flake Set WithSerial on HPA tests that conflict api registration
…able-tolerance-e2e-deterministic-cpu-load fix: [sig-autoscaling] flaky HPAConfigurableTolerance e2e should scale up but should not scale down
…-pod-status-exit-2 test/e2e/node: tolerate exit code 2 in pod status flake
gRPC defaults to the DNS resolver for bare targets passed to NewClient. For CRI socket endpoints, GetAddressAndDialer returns a socket path plus a custom dialer, but handing the bare path to grpc.NewClient still lets gRPC resolve the target first. That breaks unix socket clients with errors like "name resolver error: produced zero addresses" before the custom dialer ever sees the raw path. Use the passthrough resolver for socket-style addresses so the runtime and image clients hand the original endpoint directly to the custom dialer. Add a regression test for unix sockets, Windows named pipes, and TCP addresses. Precedent: https://github.com/etcd-io/etcd/blob/v3.3.27/clientv3/client.go#L266-L270 https://github.com/grpc/grpc-go/blob/v1.72.2/dialoptions.go#L448-L451 Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
…nt-unix-socket-dialing cri-client: use passthrough resolver for socket endpoints
KEP-5729: DRA: ResourceClaim Support for Workloads
…v1alpha2 Add workload aware preemption
…g-fixes [InPlacePodLevelResourcesVerticalScaling] Plr ippr kubelet bug fixes
[InPlacePodLevelResourcesVerticalScaling] Ippr flaky test
[PodLevelResources] Graduate InPlacePodLevelResourcesVerticalScaling feature to beta
cri-client: use atomic.Bool for useStreaming to fix data race
Fix restartable init container startup race
UPSTREAM: <carry>: Add plugin for storage performant security policy Add featuregate for performantsecuritypolicy for storage UPSTREAM: <carry>: Feature gates must now declare dependencies, even if there are none.
Signed-off-by: Harshal Patil <12152047+harche@users.noreply.github.com>
Analysis of flakes from the k8s suite has shown consistent examples of otherwise well behaved testing failing due timeouts because of temporary load on controllers during parallel testing. Increasing these timeouts will reduce flakes.
MutableCSINodeAllocatableCount is now enabled in the default feature set, the tests should succeed just fine.
…imary clusters Detect cluster's primary IP family by querying kubernetes.default service ClusterIP instead of using HasIPv4/HasIPv6 flags. The previous logic incorrectly returned ipv4 for dual-stack v6-primary clusters because both HasIPv4 and HasIPv6 were true. This matches the upstream approach in test/e2e/e2e.go and fixes DNS tests that were querying for A records instead of AAAA records in v6-primary environments.
After openshift/origin#30786 added ibmcloud to the provider switch in openshift-tests, the provider name is now correctly passed through to k8s-tests-ext. However, k8s-tests-ext only registers upstream Kubernetes providers (aws, azure, gce, kubemark, openstack, vsphere) via the test/e2e/providers.go import. OpenShift-specific providers like ibmcloud are not registered, causing framework.AfterReadingAllFlags to call SetupProviderConfig which fails with "Unknown provider" and Exit(1), crashing every test process. This registers all OpenShift-specific cloud providers (baremetal, ovirt, kubevirt, alibabacloud, nutanix, ibmcloud, external) as NullProviders in k8s-tests-ext. These providers don't require special setup for upstream kube e2e tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e2e test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The WatchList test “[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] should be requested by metadatainformer when WatchListClient is enabled” works by fetching an expected (initial) state of secrets, starting an informer, and polling until context timeout for the informer to converge to that expected state. If any other secret in the namespace changes while the test is running, they never converge, and the test times out. This change limits the secrets we’re listing to just the ones relevant to the test.
Signed-off-by: jubittajohn <jujohn@redhat.com>
To be squashed with the following commit later:"UPSTREAM: <carry>: Add OpenShift tooling, images, configs and docs" Signed-off-by: jubittajohn <jujohn@redhat.com>
…er_manager_linux_test.go Squash into: UPSTREAM: <carry>: disable load balancing on created cgroups when managed is enabled
…s in flagz_test.go and statusz_test.go
Squash into: UPSTREAM: <carry>: apiserver: add system_client=kube-{apiserver,cm,s} to apiserver_request_total
Signed-off-by: Shaza Aldawamneh <shaza.aldawamneh@hotmail.com>
…e when claims.email is used in username expression Signed-off-by: Shaza Aldawamneh <shaza.aldawamneh@hotmail.com>
…acheGC is enabled Squash into UPSTREAM: <carry>: create termination events
…s the gc integration test issue
Could squash into UPSTREAM: <carry>: emit event when readyz goes true
Squash into: UPSTREAM: <carry>: add management support to kubelet
kuberc subcommand is not yet registered in oc. Tests will be re-enabled after oc is bumped to 1.36 To be squashed with the commit UPSTREAM: <carry>: Add OpenShift tooling, images, configs and docs before 1.36 rebase bump merges Signed-off-by: jubittajohn <jujohn@redhat.com>
… driver when not enabled The upstream csi-hostpath-plugin.yaml manifest now includes a csi-snapshot-metadata sidecar container and volume (added in k/k#130918). Upstream PR k/k#137057 added conditional stripping of these when CapSnapshotMetadata is not enabled, but only for the upstream hostpathCSIDriver. The OpenShift-specific groupSnapshotHostpathCSIDriver was never updated, causing the driver pod to fail with "secret csi-snapshot-metadata-server-certs not found" and all csi-hostpath-groupsnapshot tests to fail in techpreview jobs. Signed-off-by: jubittajohn <jujohn@redhat.com>
Signed-off-by: jubittajohn <jujohn@redhat.com>
Signed-off-by: jubittajohn <jujohn@redhat.com>
|
@jubittajohn: This pull request references OCPSTRAT-3036 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
|
@jubittajohn: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary by CodeRabbit
New Features
Documentation
Chores