Skip to content

wait for controller to populate resource utilization info during startup#17862

Merged
Jackie-Jiang merged 4 commits intoapache:masterfrom
mluvin-stripe:mluvin-resource-utilization-startup
Mar 25, 2026
Merged

wait for controller to populate resource utilization info during startup#17862
Jackie-Jiang merged 4 commits intoapache:masterfrom
mluvin-stripe:mluvin-resource-utilization-startup

Conversation

@mluvin-stripe
Copy link
Copy Markdown
Contributor

@mluvin-stripe mluvin-stripe commented Mar 11, 2026

Summary

Implements #17599, ensure the controller's disk utilization cache is populated before marking the controller as healthy. Otherwise, the controller may let through segment push requests even when disk usage is already 90% right after a restart, as it could take some time to populate the disk utilization cache.

When controller.resource.utilization.checker.collect.usage.at.startup is enabled, i set controller.resource.utilization.checker.initial.delay to zero so the resource utilization checker starts running immediately. This way, controller restarts aren't delayed by waiting on the utilization checkers running with delayed start.

We'll still fail open here -- if the disk utilization checker times out fetching all server's disk utilization, we'll still let the controller be marked as healthy.

Testing

Deployed this change to one of our testing Pinot clusters:

  1. with controller.resource.utilization.checker.collect.usage.at.startup: false -- confirmed the controllers start normally, no waiting for the resource utilization checks to run
  2. with controller.resource.utilization.checker.collect.usage.at.startup -- confirmed the periodic task scheduler runs the resource utilization checker immediately, and the controller doesn't get marked GOOD until it finishes running.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 11, 2026

Codecov Report

❌ Patch coverage is 28.57143% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.20%. Comparing base (55bb21e) to head (fea5749).
⚠️ Report is 36 commits behind head on master.

Files with missing lines Patch % Lines
...apache/pinot/controller/BaseControllerStarter.java 11.11% 8 Missing ⚠️
...va/org/apache/pinot/controller/ControllerConf.java 66.66% 0 Missing and 1 partial ⚠️
...ache/pinot/core/periodictask/BasePeriodicTask.java 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17862      +/-   ##
============================================
- Coverage     63.20%   63.20%   -0.01%     
- Complexity     1481     1525      +44     
============================================
  Files          3191     3194       +3     
  Lines        192587   193658    +1071     
  Branches      29537    29789     +252     
============================================
+ Hits         121725   122398     +673     
- Misses        61327    61647     +320     
- Partials       9535     9613      +78     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.17% <28.57%> (+7.71%) ⬆️
java-21 55.49% <50.00%> (-7.69%) ⬇️
temurin 63.20% <28.57%> (-0.01%) ⬇️
unittests 63.20% <28.57%> (-0.01%) ⬇️
unittests1 55.52% <50.00%> (+0.01%) ⬆️
unittests2 34.17% <28.57%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

protected TableSizeReader _tableSizeReader;
protected StorageQuotaChecker _storageQuotaChecker;
protected final List<UtilizationChecker> _utilizationCheckers = new ArrayList<>();
protected DiskUtilizationChecker _diskUtilizationChecker;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used/initialized?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh you're right, i'm not using it anywhere. let me remove it

// Wait for the resource utilization checker to run once during controller startup, before marking the controller
// as ready. When enabled, controller.resource.utilization.checker.initial.delay is set to 0.
public static final String RESOURCE_UTILIZATION_CHECKER_WAIT_DURING_STARTUP =
"controller.resource.utilization.checker.waitDuringStartup";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the config consistent (no camel case)

I feel it is more clear if we name it controller.resource.utilization.checker.run.at.startup

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think "run at startup" makes this seem equivalent to just setting controller.resource.utilization.checker.initial.delay = 0, but it's actually different since we're waiting for the checker to run before marking the controller as healthy.

but lmk what you think, otherwise what do you think of controller.resource.utilization.checker.wait.for.completion.at.startup

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completion seems a little bit misleading. How about controller.resource.utilization.checker.collect.usage.at.startup?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds good, i'll make the change

@Jackie-Jiang Jackie-Jiang added enhancement Improvement to existing functionality ingestion Related to data ingestion pipeline labels Mar 19, 2026
@mluvin-stripe mluvin-stripe force-pushed the mluvin-resource-utilization-startup branch from 8615240 to 60a95b0 Compare March 20, 2026 15:44
return getProperty(RESOURCE_UTILIZATION_CHECKER_FREQUENCY, DEFAULT_RESOURCE_UTILIZATION_CHECKER_FREQUENCY);
}

public boolean isResourceUtilizationCheckerCollectUsageAtStartupEnabled() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) isResourceUtilizationCheckerCollectUsageAtStartup()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i was following other boolean configs, and they have Enabled at the end -- e.g.

public boolean isResourceUtilizationCheckEnabled() {
return getProperty(ENABLE_RESOURCE_UTILIZATION_CHECK, DEFAULT_ENABLE_RESOURCE_UTILIZATION_CHECK);
}
public boolean isAllResourceUtilizationCheckersEnabled() {
return getProperty(ENABLE_ALL_RESOURCE_UTILIZATION_CHECKERS, DEFAULT_ENABLE_ALL_RESOURCE_UTILIZATION_CHECKERS);
}
public boolean isDiskUtilizationCheckerEnabled() {
return getProperty(ENABLE_DISK_UTILIZATION_CHECKER, DEFAULT_ENABLE_DISK_UTILIZATION_CHECKER);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is for properties starting with enable. For other boolean flags, we don't usually add enabled suffix

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, just fixed

@Jackie-Jiang Jackie-Jiang merged commit 786d056 into apache:master Mar 25, 2026
15 of 16 checks passed
@Jackie-Jiang Jackie-Jiang added documentation Improvements or additions to documentation configuration Config changes (addition/deletion/change in behavior) labels Mar 25, 2026
xiangfu0 pushed a commit to pinot-contrib/pinot-docs that referenced this pull request Mar 26, 2026
@xiangfu0
Copy link
Copy Markdown
Contributor

Documentation PR created: pinot-contrib/pinot-docs#577

Added documentation for the new controller.resource.utilization.checker.collect.usage.at.startup configuration property in the controller configuration reference guide.

xiangfu0 added a commit to pinot-contrib/pinot-docs that referenced this pull request Mar 27, 2026
…artup config (apache/pinot#17862) (#577)

Co-authored-by: Pinot Docs Bot <docs-bot@pinot.apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

configuration Config changes (addition/deletion/change in behavior) documentation Improvements or additions to documentation enhancement Improvement to existing functionality ingestion Related to data ingestion pipeline

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants