wait for controller to populate resource utilization info during startup#17862
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17862 +/- ##
============================================
- Coverage 63.20% 63.20% -0.01%
- Complexity 1481 1525 +44
============================================
Files 3191 3194 +3
Lines 192587 193658 +1071
Branches 29537 29789 +252
============================================
+ Hits 121725 122398 +673
- Misses 61327 61647 +320
- Partials 9535 9613 +78
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| protected TableSizeReader _tableSizeReader; | ||
| protected StorageQuotaChecker _storageQuotaChecker; | ||
| protected final List<UtilizationChecker> _utilizationCheckers = new ArrayList<>(); | ||
| protected DiskUtilizationChecker _diskUtilizationChecker; |
There was a problem hiding this comment.
Is this used/initialized?
There was a problem hiding this comment.
oh you're right, i'm not using it anywhere. let me remove it
| // Wait for the resource utilization checker to run once during controller startup, before marking the controller | ||
| // as ready. When enabled, controller.resource.utilization.checker.initial.delay is set to 0. | ||
| public static final String RESOURCE_UTILIZATION_CHECKER_WAIT_DURING_STARTUP = | ||
| "controller.resource.utilization.checker.waitDuringStartup"; |
There was a problem hiding this comment.
Keep the config consistent (no camel case)
I feel it is more clear if we name it controller.resource.utilization.checker.run.at.startup
There was a problem hiding this comment.
i think "run at startup" makes this seem equivalent to just setting controller.resource.utilization.checker.initial.delay = 0, but it's actually different since we're waiting for the checker to run before marking the controller as healthy.
but lmk what you think, otherwise what do you think of controller.resource.utilization.checker.wait.for.completion.at.startup
There was a problem hiding this comment.
Completion seems a little bit misleading. How about controller.resource.utilization.checker.collect.usage.at.startup?
There was a problem hiding this comment.
that sounds good, i'll make the change
8615240 to
60a95b0
Compare
| return getProperty(RESOURCE_UTILIZATION_CHECKER_FREQUENCY, DEFAULT_RESOURCE_UTILIZATION_CHECKER_FREQUENCY); | ||
| } | ||
|
|
||
| public boolean isResourceUtilizationCheckerCollectUsageAtStartupEnabled() { |
There was a problem hiding this comment.
(nit) isResourceUtilizationCheckerCollectUsageAtStartup()
There was a problem hiding this comment.
ah i was following other boolean configs, and they have Enabled at the end -- e.g.
pinot/pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java
Lines 1153 to 1164 in c8dccfb
There was a problem hiding this comment.
That is for properties starting with enable. For other boolean flags, we don't usually add enabled suffix
There was a problem hiding this comment.
got it, just fixed
|
Documentation PR created: pinot-contrib/pinot-docs#577 Added documentation for the new |
…artup config (apache/pinot#17862) (#577) Co-authored-by: Pinot Docs Bot <docs-bot@pinot.apache.org>
Summary
Implements #17599, ensure the controller's disk utilization cache is populated before marking the controller as healthy. Otherwise, the controller may let through segment push requests even when disk usage is already 90% right after a restart, as it could take some time to populate the disk utilization cache.
When
controller.resource.utilization.checker.collect.usage.at.startupis enabled, i setcontroller.resource.utilization.checker.initial.delayto zero so the resource utilization checker starts running immediately. This way, controller restarts aren't delayed by waiting on the utilization checkers running with delayed start.We'll still fail open here -- if the disk utilization checker times out fetching all server's disk utilization, we'll still let the controller be marked as healthy.
Testing
Deployed this change to one of our testing Pinot clusters:
controller.resource.utilization.checker.collect.usage.at.startup: false-- confirmed the controllers start normally, no waiting for the resource utilization checks to runcontroller.resource.utilization.checker.collect.usage.at.startup-- confirmed the periodic task scheduler runs the resource utilization checker immediately, and the controller doesn't get markedGOODuntil it finishes running.