test: add openshift e2e smoke test by GrigoryPervakov · Pull Request #202 · ClickHouse/clickhouse-operator

GrigoryPervakov · 2026-05-27T19:49:30Z

Why

Need to verify OpenShift compatibility

What

Add a special OpenShift e2e test against the pre-provided OKD cluster

Bounce catalog-operator + olm-operator deployments in the workflow's health-check step. The libvirt snapshot revert leaves OLM informers in a state where new CatalogSources are never reconciled (status remains empty for 5+ minutes). Forcing fresh pod startups before the test runs clears that stale state.

OpenShift's namespace controller auto-fills uid-range, supplemental-groups and sa.scc.mcs together on namespace creation. Setting only two of three skips the backfill and the catalog registry pod is rejected by SCC admission with 'unable to find annotation openshift.io/sa.scc.mcs'. Let the controller assign all three.

The FBC catalog at ghcr.io/clickhouse/clickhouse-operator-catalog:latest is built with GenerateMajorChannels=true, so the only channel is 'stable-v0', not 'stable'. Subscription with channel: stable was failing resolution with 'no operators found in channel stable of package clickhouse-operator'.

Operator now installs into clickhouse-operator-olm and watches a separate clickhouse-operator-test namespace. Workload CRs (KeeperCluster, ClickHouseCluster) land in the test namespace via testDeployment. Also fans diagnostic dumps over both namespaces.

The public catalog ships v0.0.5, which lacks the operator-side emptyDir fallback (introduced in dab3ace, awaiting release). Inject emptyDir at the data path via PodTemplate.Volumes and ContainerTemplate.VolumeMounts so keeper and clickhouse pods can write under restricted SCC. Drop the override once the catalog ships >= v0.0.6.

The runner image no longer ships okd-revert.sh — the long-lived OKD cluster is wiped per-run via okd-cleanup.sh (namespace + OLM-object delete; cluster stays running and keeps kubelet certs fresh) and destroy-+-reinstalled weekly via okd-rebuild.sh. - openshift-compatibility{,-pr}.yaml: drop the snapshot-revert + the control-plane-probe step (it was a post-revert KCM-recovery probe; no longer applicable). Call okd-cleanup.sh; it does its own CO health check at the tail so the inline 'Verify cluster health' is redundant. - openshift-runner-rebuild.yaml: new weekly cron + workflow_dispatch on the self-hosted runner; shares the 'openshift-compatibility' concurrency group so it can't race a test job. Calls okd-rebuild.sh with --preserve-env=GITHUB_ACTIONS so the script's runner-service-stop guard kicks in.

The catalog at :latest now carries both stable-v0 (release bundles) and fast-v0 (per-commit main builds) since PR #211 landed. Pointing the openshift e2e at fast-v0 means the test exercises the actual current main, not the last release — the original intent of the openshift compatibility check.

okd-cleanup.sh + okd-rebuild.sh are committed in the runner repo but aren't on the live runner image yet — they need a Packer build + roll. Until then the snapshot-revert path is still the only thing on the runner; restore the workflow call so the fast-v0 channel switch can actually run. Switch back to okd-cleanup.sh after the runner image is refreshed.

…rolls" This reverts commit 6bb7a0e.

The OpenShift compatibility e2e runs against a fresh OKD cluster where docker.io/clickhouse/clickhouse-server:26.3 (~600MB) and docker.io/clickhouse/clickhouse-keeper:26.3 are cold-pulled at pod-creation time. A 5-minute kubectl wait for ClickHouseCluster Ready expires before the version-probe Job's image-pull completes. Kind-based e2e shards pre-load images via 'kind load docker-image' so the cold-pull case doesn't apply there; the bumped timeout is just a ceiling, not a slowdown — happy-path tests still finish in seconds.

Previously the OpenShift compat e2e was a separate workflow that ran after each main commit (workflow_run trigger). PRs got coverage via a temporary openshift-compatibility-pr.yaml that mirrored the main-branch shape — but a PR with a green main-branch compat run was no guarantee the PR's actual diff worked against the freshly-published fast catalog. New shape: - ci.yaml gains an 'openshift-compat' job that needs all the regular Operator CI gates (lint, bundle, build_and_test, fuzz_specs, helm-test, compat-e2e-test, e2e-test, check-crd-compat) and runs only when they're green. Same-repo-only — fork PRs cannot reach the self-hosted runner. - continue-on-error: true; the job is intentionally NOT a dependency of ci-success-check. The OKD self-hosted runner can be down for cluster maintenance (weekly rebuild) without blocking PR merges. - openshift-compatibility.yaml is now workflow_dispatch-only — manual reruns against the runner without re-driving the full Operator CI matrix. - openshift-compatibility-pr.yaml deleted (its own comment said to drop it once the harness verified).

GrigoryPervakov force-pushed the okd-test branch 14 times, most recently from 446f552 to dab3ace Compare May 29, 2026 17:57

GrigoryPervakov mentioned this pull request Jun 2, 2026

build: publish per-commit bundle + catalog on 'fast' channel #211

Merged

GrigoryPervakov added 11 commits June 2, 2026 18:56

test: add openshift e2e smoke test

ed8037d

test: delete OLM pods (kube-controller-manager won't process rollout)

a0cfea7

test: probe kube-controller-manager recovery before OLM test

1c7b8e9

test: rebuild runner image to refresh kubelet certs

db49690

GrigoryPervakov force-pushed the okd-test branch from 187fb89 to f7cabc4 Compare June 2, 2026 19:01

GrigoryPervakov added 3 commits June 2, 2026 19:06

Revert "test: revert compat cleanup to okd-revert.sh until new image …

0645275

…rolls" This reverts commit 6bb7a0e.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add openshift e2e smoke test#202

test: add openshift e2e smoke test#202
GrigoryPervakov wants to merge 15 commits into
mainfrom
okd-test

GrigoryPervakov commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GrigoryPervakov commented May 27, 2026

Why

What

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant