Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 19 additions & 4 deletions modules/manage/pages/cluster-maintenance/cluster-balancing.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -90,22 +90,37 @@ Redpanda's default partition balancing includes the following:

Monitoring unavailable brokers lets Redpanda self-heal clusters by moving partitions from a failed broker to a healthy broker. Monitoring low disk space lets Redpanda distribute partitions across brokers with enough disk space. If free disk space reaches a critically low level, Redpanda blocks clients from producing. For information about the disk space threshold and alert, see xref:./disk-utilization.adoc#handle-full-disks[Handle full disks].

[[partition_autobalancing_mode]]
=== Partition balancing settings

Select your partition balancing setting with the xref:reference:cluster-properties.adoc#partition_autobalancing_mode[`partition_autobalancing_mode`] property.
The xref:reference:cluster-properties.adoc#partition_autobalancing_mode[`partition_autobalancing_mode`] cluster property controls when and how Redpanda automatically rebalances partition replicas across brokers.

To check the current value:

[,bash]
----
rpk cluster config get partition_autobalancing_mode
----

To change the value:

[,bash]
----
rpk cluster config set partition_autobalancing_mode <value>
----

|===
| Setting | Description

| `node_add`
| Partition balancing happens when brokers (nodes) are added. To avoid hotspots, Redpanda allocates brokers to random healthy brokers. +
| Partition balancing happens when brokers (nodes) are added. To avoid hotspots, Redpanda allocates partitions to random healthy brokers. +
+
This is the default setting.
This is the default setting for clusters without an enterprise license.

| `continuous`
| Redpanda continuously monitors the cluster for broker failures and high disk usage and automatically redistributes partitions to maintain optimal performance and availability. It also monitors rack availability after failures, and for a given partition, it tries to move excess replicas from racks that have more than one replica to racks where there are none. See xref:./continuous-data-balancing.adoc[Configure Continuous Data Balancing]. +
+
This requires an enterprise license.
This is the default setting for clusters with an enterprise license. It requires an enterprise license.

| `off`
| All partition balancing from Redpanda is turned off. +
Expand Down
18 changes: 16 additions & 2 deletions modules/reference/pages/public-metrics-reference.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -889,15 +889,29 @@ endif::[]

=== redpanda_memory_allocated_memory

Total memory allocated (in bytes) per CPU shard.
Total memory allocated (in bytes) per CPU shard. This includes all memory currently held by Redpanda on that shard, including memory in the batch cache that has been allocated but could be reclaimed.

*Type*: gauge

*Labels*:

* `shard`

*Usage*: This metric includes reclaimable memory from the batch cache. For monitoring memory pressure, consider using `redpanda_memory_available_memory` instead, which provides a more accurate picture of memory that can be immediately reallocated.
*Usage*: This metric counts all allocated memory, including reclaimable batch cache memory, so it may appear high even when the system is not under memory pressure. To monitor for memory exhaustion, use xref:reference:public-metrics-reference.adoc#redpanda_memory_available_memory[`redpanda_memory_available_memory`] instead, which deducts reclaimable memory and gives a more accurate view of how much memory is actually free.
Comment on lines +892 to +900
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent memory semantics across docs after this update

Line 892 now says redpanda_memory_allocated_memory includes reclaimable batch cache memory, but modules/manage/partials/monitor-health.adoc (lines 63-86) still states allocated memory does not include reclaimable cache memory. Please align that monitoring page to this new definition to avoid conflicting guidance for alerting formulas.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/reference/pages/public-metrics-reference.adoc` around lines 892 -
900, The docs are inconsistent: update the monitoring page text in
monitor-health.adoc to match the new definition that
redpanda_memory_allocated_memory includes reclaimable batch cache memory; change
the wording where it currently states allocated memory excludes reclaimable
cache to instead state it includes reclaimable batch cache memory and update any
alerting guidance/examples to recommend using redpanda_memory_available_memory
for measuring actual free memory (and adjust any formulas that presumed
allocated excluded reclaimable memory).


To see `redpanda_memory_allocated_memory` broken down by shard, query Prometheus directly:

[,promql]
----
redpanda_memory_allocated_memory
----

To see total allocated memory across all shards on a broker:

[,promql]
----
sum by (instance) (redpanda_memory_allocated_memory)
----

ifdef::env-cloud[]
*Available in Serverless*: No
Expand Down