Skip to content

Require all kstat targets to declare their sample limit#10545

Open
bnaecker wants to merge 1 commit into
mainfrom
ben/right-size-kstat-sample-queue
Open

Require all kstat targets to declare their sample limit#10545
bnaecker wants to merge 1 commit into
mainfrom
ben/right-size-kstat-sample-queue

Conversation

@bnaecker
Copy link
Copy Markdown
Collaborator

@bnaecker bnaecker commented Jun 3, 2026

Remove the default per-target sample limit in the KstatSampler. Instead, require all targets to declare their own limit after which the sampler starts to drop the oldest samples. This forces every target to right-size its own queue, so that developers need to consider their sampling rates and data cardinality at the time they register.

Fixes #10542

@bnaecker bnaecker force-pushed the ben/right-size-kstat-sample-queue branch from d20d380 to e405633 Compare June 3, 2026 23:25
Remove the default per-target sample limit in the `KstatSampler`.
Instead, require all targets to declare their own limit after which the
sampler starts to drop the oldest samples. This forces every target to
right-size its own queue, so that developers need to consider their
sampling rates and data cardinality at the time they register.

Fixes #10542
@bnaecker bnaecker force-pushed the ben/right-size-kstat-sample-queue branch from e405633 to d72ed5f Compare June 3, 2026 23:33
@bnaecker bnaecker requested a review from jmcarp June 3, 2026 23:38
/// The maximum cardinality of the data we produce, per sampling interval.
pub const fn max_cardinality() -> usize {
// Assume max of 256 CPUs
CPU_MICROSTATES.len() * 256
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to exceed 256 CPUs with dense Turin, right? Should we bump this even high in anticipation of that?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yes possibly. I debated making this method detect the actual number of cores. It's just a little more annoying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The 'sled_cpu:cpu_nsec' metric only reports values for the last 100 cpu_ids of the host

2 participants