You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: charts/controlplane-operations/Chart.yaml
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
apiVersion: v2
2
2
name: controlplane-operations
3
-
version: 1.1.8
3
+
version: 1.1.9
4
4
description: A set of Plutono dashboards and Prometheus alerting rules combined with playbooks to ensure effective operations of Controlplane clusters.
description: etcd KCP backup Pod {{`{{ $labels.namespace }}`}}/{{`{{ $labels.pod }}`}} with instance IP {{`{{ $labels.instance }}`}} has a full snapshot that is too old. Check pod logs and events for more details.
76
+
summary: etcd KCP backup Pod {{`{{ $labels.pod }}`}} has a full snapshot that is too old.
77
+
{{- end }}
78
+
79
+
{{- if not (.Values.prometheusRules.disabled.EtcdKCPBackupIncrSnapshotTooOld | default false) }}
description: etcd KCP backup Pod {{`{{ $labels.namespace }}`}}/{{`{{ $labels.pod }}`}} with instance IP {{`{{ $labels.instance }}`}} has an outdated full snapshot. Check pod logs and events for more details.
73
-
summary: etcd KCP backup Pod {{`{{ $labels.pod }}`}} has an outdated full snapshot.
100
+
description: etcd KCP backup Pod {{`{{ $labels.namespace }}`}}/{{`{{ $labels.pod }}`}} with instance IP {{`{{ $labels.instance }}`}} has an incremental snapshot that is too old. Check pod logs and events for more details.
101
+
summary: etcd KCP backup Pod {{`{{ $labels.pod }}`}} has incremental snapshot that is too old.
description: Bond `{{`{{ $labels.master }}`}}` on `{{`{{ $labels.node }}`}}` is degraded. Imminent network outage for this node.
16
-
summary: Bond `{{`{{ $labels.master }}`}}` is degraded. Node network connectivity is not HA. Switch failover or upgrade will cause an outage!
15
+
description: Bond `{{`{{ $labels.master }}`}}` on `{{`{{ $labels.node }}`}}` is degraded. Imminent network outage for this node. Node network connectivity is not HA. Switch failover or upgrade will cause an outage!
16
+
summary: Bond `{{`{{ $labels.master }}`}}` is degraded.
17
17
{{- end }}
18
18
19
19
{{- if not (.Values.prometheusRules.disabled.NodeVirtualInterfaceDown | default false) }}
description: Interface `{{`{{ $labels.device }}`}}` on `{{`{{ $labels.node }}`}}` is down. Tenant network outage for this node.
31
-
summary: Interface `{{`{{ $labels.device }}`}}` is down. Node network connectivity is degraded.
30
+
description: Interface `{{`{{ $labels.device }}`}}` on `{{`{{ $labels.node }}`}}` is down. Tenant network outage for this node. Node network connectivity is degraded.
31
+
summary: Interface `{{`{{ $labels.device }}`}}` is down.
description: Shoot {{`{{ $labels.name }}`}} from project {{`{{ $labels.project }}`}} on {{`{{ $labels.landscape }}`}} is not being reconciled successfully for {{ dig "ShootReconciliationFailed" "for" "30m" .Values.prometheusRules }} minutes. Check the shoot's conditions and events for more details.
48
-
summary: Shoot {{`{{ $labels.name }}`}} from project {{`{{ $labels.project }}`}} on {{`{{ $labels.landscape }}`}} is not being reconciled successfully.
48
+
summary: Shoot {{`{{ $labels.name }}`}} on {{`{{ $labels.landscape }}`}} is not being reconciled successfully.
49
49
{{- end }}
50
50
51
51
{{- if not (.Values.prometheusRules.disabled.ShootConditionNotTrue | default false) }}
description: Shoot {{`{{ $labels.name }}`}} of project {{`{{ $labels.project }}`}} seeded from {{`{{ $labels.landscape }}`}}/{{`{{ $labels.seed }}`}} has a condition that is not True. Check the Shoot's conditions and events for more details.
77
-
summary: Shoot {{`{{ $labels.name }}`}} of project {{`{{ $labels.project }}`}} seeded from {{`{{ $labels.landscape }}`}}/{{`{{ $labels.seed }}`}} has a condition that is not True.
77
+
summary: Shoot {{`{{ $labels.name }}`}} seeded from {{`{{ $labels.seed }}`}} has a condition that is not True.
78
78
{{- end }}
79
79
80
80
{{- if not (.Values.prometheusRules.disabled.SeedConditionNotTrue | default false) }}
description: Calico Node Pod {{`{{ $labels.pod }}`}} on Shoot/Node {{trimPrefix "shoot--cp--" "`{{ $labels.cluster }}`"}}/{{`{{ $labels.node }}`}} has less than {{ .Values.prometheusRules.calico.bgpNeighborCount }} BGP neighbors. BGP peer is not established. Network datapath threatened! Switch upgrades or misconfiguration?
152
-
summary: Calico Node Pod {{`{{ $labels.pod }}`}} on Shoot/Node {{trimPrefix "shoot--cp--" "`{{ $labels.cluster }}`"}}/{{`{{ $labels.node }}`}} has less than {{ .Values.prometheusRules.calico.bgpNeighborCount }} BGP neighbors.
152
+
summary: Calico Node Pod {{`{{ $labels.pod }}`}} on Shoot {{trimPrefix "shoot--cp--" "`{{ $labels.cluster }}`"}} has less than {{ .Values.prometheusRules.calico.bgpNeighborCount }} BGP neighbors.
153
153
{{- end }}
154
154
155
155
{{- if not (.Values.prometheusRules.disabled.CalicoBgpNeighborSessionAllDown | default false) }}
description: Calico Node Pod {{`{{ $labels.pod }}`}} on Shoot/Node {{trimPrefix "shoot--cp--" "`{{ $labels.cluster }}`"}}/{{`{{ $labels.node }}`}} has no BGP neighbors. Network datapath is down! Switch upgrades or misconfiguration?
180
-
summary: Calico Node Pod {{`{{ $labels.pod }}`}} on Shoot/Node {{trimPrefix "shoot--cp--" "`{{ $labels.cluster }}`"}}/{{`{{ $labels.node }}`}} has no BGP neighbors.
180
+
summary: Calico Node Pod {{`{{ $labels.pod }}`}} on Shoot {{trimPrefix "shoot--cp--" "`{{ $labels.cluster }}`"}} has no BGP neighbors.
181
181
{{- end }}
182
182
183
183
{{- if not (.Values.prometheusRules.disabled.CalicoNodeMissing | default false) }}
description: Machine {{`{{ $labels.name }}`}} from shoot {{`{{ $labels.shoot_name }}`}} of project {{`{{ $labels.project }}`}} is not Ready. Check the Machine's conditions and events for more details.
257
-
summary: Machine {{`{{ $labels.name }}`}} from shoot {{`{{ $labels.shoot_name }}`}} of project {{`{{ $labels.project }}`}} is not Ready.
256
+
description: Machine {{`{{ $labels.name }}`}} from Shoot {{`{{ $labels.shoot_name }}`}} of project {{`{{ $labels.project }}`}} is not Ready. Check the Machine's conditions and events for more details.
257
+
summary: Machine {{`{{ $labels.name }}`}} from Shoot {{`{{ $labels.shoot_name }}`}} is not Ready.
258
258
{{- end }}
259
259
260
260
{{- if not (.Values.prometheusRules.disabled.MCMMachineStuckInTerminating | default false) }}
description: Machine {{`{{ $labels.name }}`}} from shoot {{`{{ $labels.shoot_name }}`}} of project {{`{{ $labels.project }}`}} is stuck in Terminating state. Check the Machine's conditions and events for more details.
286
-
summary: Machine {{`{{ $labels.name }}`}} from shoot {{`{{ $labels.shoot_name }}`}} of project {{`{{ $labels.project }}`}} is stuck in Terminating state.
285
+
description: Machine {{`{{ $labels.name }}`}} from Shoot {{`{{ $labels.shoot_name }}`}} of project {{`{{ $labels.project }}`}} is stuck in Terminating state. Check the Machine's conditions and events for more details.
286
+
summary: Machine {{`{{ $labels.name }}`}} from Shoot {{`{{ $labels.shoot_name }}`}} is stuck in Terminating state.
287
287
{{- end }}
288
288
289
289
{{- if not (.Values.prometheusRules.disabled.MCMMachineFailed | default false) }}
description: Machine {{`{{ $labels.name }}`}} from shoot {{`{{ $labels.shoot_name }}`}} of project {{`{{ $labels.project }}`}} is in Failed state. Check the Machine's conditions and events for more details.
315
-
summary: Machine {{`{{ $labels.name }}`}} from shoot {{`{{ $labels.shoot_name }}`}} of project {{`{{ $labels.project }}`}} is in Failed state.
314
+
description: Machine {{`{{ $labels.name }}`}} from Shoot {{`{{ $labels.shoot_name }}`}} of project {{`{{ $labels.project }}`}} is in Failed state. Check the Machine's conditions and events for more details.
315
+
summary: Machine {{`{{ $labels.name }}`}} from Shoot {{`{{ $labels.shoot_name }}`}} is in Failed state.
316
316
{{- end }}
317
317
318
318
{{- if not (.Values.prometheusRules.disabled.MCMMachineCrashLoopBackOff | default false) }}
description: Machine {{`{{ $labels.name }}`}} from shoot {{`{{ $labels.shoot_name }}`}} of project {{`{{ $labels.project }}`}} is in CrashLoopBackOff state. Check the Machine's conditions and events for more details.
344
-
summary: Machine {{`{{ $labels.name }}`}} from shoot {{`{{ $labels.shoot_name }}`}} of project {{`{{ $labels.project }}`}} is in CrashLoopBackOff state.
343
+
description: Machine {{`{{ $labels.name }}`}} from Shoot {{`{{ $labels.shoot_name }}`}} of project {{`{{ $labels.project }}`}} is in CrashLoopBackOff state. Check the Machine's conditions and events for more details.
344
+
summary: Machine {{`{{ $labels.name }}`}} from Shoot {{`{{ $labels.shoot_name }}`}} is in CrashLoopBackOff state.
345
345
{{- end }}
346
346
347
347
{{- if not (.Values.prometheusRules.disabled.MCMMachineStuckInPending | default false) }}
description: Machine {{`{{ $labels.name }}`}} from shoot {{`{{ $labels.shoot_name }}`}} of project {{`{{ $labels.project }}`}} is stuck in Pending state. Check the Machine's conditions and events for more details.
373
-
summary: Machine {{`{{ $labels.name }}`}} from shoot {{`{{ $labels.shoot_name }}`}} of project {{`{{ $labels.project }}`}} is stuck in Pending state.
372
+
description: Machine {{`{{ $labels.name }}`}} from Shoot {{`{{ $labels.shoot_name }}`}} of project {{`{{ $labels.project }}`}} is stuck in Pending state. Check the Machine's conditions and events for more details.
373
+
summary: Machine {{`{{ $labels.name }}`}} from Shoot {{`{{ $labels.shoot_name }}`}} is stuck in Pending state.
description: "Argora ClusterImport CR status is in Error state for more than 10 minutes."
35
-
summary: "ClusterImport CR in Error state."
34
+
description: Argora ClusterImport CR status is in Error state for more than {{ dig "ArgoraClusterImportInError" "for" "10m" .Values.prometheusRules }} minutes.
35
+
summary: ClusterImport CR in Error state.
36
36
{{- end }}
37
37
38
38
{{- if not (.Values.prometheusRules.disabled.ArgoraPodNotReadyError | default false) }}
0 commit comments