test: add openshift e2e smoke test#202
Draft
GrigoryPervakov wants to merge 15 commits into
Draft
Conversation
446f552 to
dab3ace
Compare
Bounce catalog-operator + olm-operator deployments in the workflow's health-check step. The libvirt snapshot revert leaves OLM informers in a state where new CatalogSources are never reconciled (status remains empty for 5+ minutes). Forcing fresh pod startups before the test runs clears that stale state.
OpenShift's namespace controller auto-fills uid-range, supplemental-groups and sa.scc.mcs together on namespace creation. Setting only two of three skips the backfill and the catalog registry pod is rejected by SCC admission with 'unable to find annotation openshift.io/sa.scc.mcs'. Let the controller assign all three.
The FBC catalog at ghcr.io/clickhouse/clickhouse-operator-catalog:latest is built with GenerateMajorChannels=true, so the only channel is 'stable-v0', not 'stable'. Subscription with channel: stable was failing resolution with 'no operators found in channel stable of package clickhouse-operator'.
Operator now installs into clickhouse-operator-olm and watches a separate clickhouse-operator-test namespace. Workload CRs (KeeperCluster, ClickHouseCluster) land in the test namespace via testDeployment. Also fans diagnostic dumps over both namespaces.
The public catalog ships v0.0.5, which lacks the operator-side emptyDir fallback (introduced in dab3ace, awaiting release). Inject emptyDir at the data path via PodTemplate.Volumes and ContainerTemplate.VolumeMounts so keeper and clickhouse pods can write under restricted SCC. Drop the override once the catalog ships >= v0.0.6.
The runner image no longer ships okd-revert.sh — the long-lived OKD
cluster is wiped per-run via okd-cleanup.sh (namespace + OLM-object
delete; cluster stays running and keeps kubelet certs fresh) and
destroy-+-reinstalled weekly via okd-rebuild.sh.
- openshift-compatibility{,-pr}.yaml: drop the snapshot-revert + the
control-plane-probe step (it was a post-revert KCM-recovery probe;
no longer applicable). Call okd-cleanup.sh; it does its own CO
health check at the tail so the inline 'Verify cluster health' is
redundant.
- openshift-runner-rebuild.yaml: new weekly cron + workflow_dispatch
on the self-hosted runner; shares the 'openshift-compatibility'
concurrency group so it can't race a test job. Calls okd-rebuild.sh
with --preserve-env=GITHUB_ACTIONS so the script's
runner-service-stop guard kicks in.
The catalog at :latest now carries both stable-v0 (release bundles) and fast-v0 (per-commit main builds) since PR #211 landed. Pointing the openshift e2e at fast-v0 means the test exercises the actual current main, not the last release — the original intent of the openshift compatibility check.
okd-cleanup.sh + okd-rebuild.sh are committed in the runner repo but aren't on the live runner image yet — they need a Packer build + roll. Until then the snapshot-revert path is still the only thing on the runner; restore the workflow call so the fast-v0 channel switch can actually run. Switch back to okd-cleanup.sh after the runner image is refreshed.
…rolls" This reverts commit 6bb7a0e.
The OpenShift compatibility e2e runs against a fresh OKD cluster where docker.io/clickhouse/clickhouse-server:26.3 (~600MB) and docker.io/clickhouse/clickhouse-keeper:26.3 are cold-pulled at pod-creation time. A 5-minute kubectl wait for ClickHouseCluster Ready expires before the version-probe Job's image-pull completes. Kind-based e2e shards pre-load images via 'kind load docker-image' so the cold-pull case doesn't apply there; the bumped timeout is just a ceiling, not a slowdown — happy-path tests still finish in seconds.
Previously the OpenShift compat e2e was a separate workflow that ran after each main commit (workflow_run trigger). PRs got coverage via a temporary openshift-compatibility-pr.yaml that mirrored the main-branch shape — but a PR with a green main-branch compat run was no guarantee the PR's actual diff worked against the freshly-published fast catalog. New shape: - ci.yaml gains an 'openshift-compat' job that needs all the regular Operator CI gates (lint, bundle, build_and_test, fuzz_specs, helm-test, compat-e2e-test, e2e-test, check-crd-compat) and runs only when they're green. Same-repo-only — fork PRs cannot reach the self-hosted runner. - continue-on-error: true; the job is intentionally NOT a dependency of ci-success-check. The OKD self-hosted runner can be down for cluster maintenance (weekly rebuild) without blocking PR merges. - openshift-compatibility.yaml is now workflow_dispatch-only — manual reruns against the runner without re-driving the full Operator CI matrix. - openshift-compatibility-pr.yaml deleted (its own comment said to drop it once the harness verified).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Need to verify OpenShift compatibility
What
Add a special OpenShift e2e test against the pre-provided OKD cluster