Skip to content

feat: warnings as a kubernetes events#209

Open
scanhex12 wants to merge 10 commits into
mainfrom
warnings_events
Open

feat: warnings as a kubernetes events#209
scanhex12 wants to merge 10 commits into
mainfrom
warnings_events

Conversation

@scanhex12
Copy link
Copy Markdown
Member

Why

We want to track warnings from pods. This PR allows to add records from the system.warning table into k8s events

@scanhex12 scanhex12 changed the title Warnings as a profile events feat: warnings as a profile events Jun 1, 2026
@scanhex12 scanhex12 changed the title feat: warnings as a profile events feat: warnings as a kubernetes events Jun 1, 2026
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ scanhex12
❌ Copilot
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR surfaces rows from ClickHouse's system.warnings table as Kubernetes Warning events on the ClickHouseCluster object. A new reconcile step (reconcileWarnings) polls every replica's system.warnings in parallel and emits one event per warning message via the existing event recorder, using a hash of the message as the event "action" to drive aggregation. An e2e test triggers a warning by setting max_table_num_to_warn: 1 and asserts both the row in system.warnings and the corresponding ClickHouseWarning event.

Changes:

  • New Warnings step in the ClickHouse reconcile chain that calls a new commander.Warnings() helper and emits ClickHouseWarning events with hashed actions for dedup.
  • New EventWarningRecord reason constant, WarningsPollInterval (30s) constant, and unconditional 30s requeue.
  • e2e coverage plus a cert-manager install-wait timeout bump from 5m to 10m.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
api/v1alpha1/events.go Adds EventWarningRecord event reason constant.
internal/controller/constants.go Adds WarningsPollInterval = 30s.
internal/controller/clickhouse/commands.go Adds commander.Warnings() querying system.warnings.
internal/controller/clickhouse/sync.go Adds reconcileWarnings step + warningAction hash helper; emits per-warning events and unconditionally requeues.
test/e2e/clickhouse_e2e_test.go New e2e test verifying warnings flow end-to-end.
test/testutil/utils.go Bumps cert-manager webhook wait timeout from 5m to 10m.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread api/v1alpha1/events.go
Comment thread internal/controller/clickhouse/sync.go
Comment thread internal/controller/clickhouse/sync.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

@scanhex12 scanhex12 requested a review from GrigoryPervakov June 1, 2026 17:43
return fmt.Sprintf("Warning-%016x", h.Sum64())
}

func (r *clickhouseReconciler) reconcileWarnings(ctx context.Context, log ctrlutil.Logger) (chctrl.StepResult, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will send an event for every warning on every reconciliation. We need a way to deduplicate it

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubernetes does deduplication by itself - https://github.com/kubernetes/design-proposals-archive/blob/main/instrumentation/events-redesign.md

Deduplication logic groups together Event which have the same:

Source Component and Host
InvolvedObject Kind, Namespace, Name, API version and UID,
Type (works as Severity)
Reason

Copy link
Copy Markdown
Member Author

@scanhex12 scanhex12 Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw that's why we need to add a hash in warningAction. Otherwise it will merge all messages into one and my test will become flaky

@scanhex12 scanhex12 requested a review from GrigoryPervakov June 2, 2026 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants