Skip to content

Optimize the DB updates to use bulk UPDATE instead of row-level locks.#13349

Open
sureshanaparti wants to merge 5 commits into
apache:mainfrom
shapeblue:optimize-db-updates
Open

Optimize the DB updates to use bulk UPDATE instead of row-level locks.#13349
sureshanaparti wants to merge 5 commits into
apache:mainfrom
shapeblue:optimize-db-updates

Conversation

@sureshanaparti
Copy link
Copy Markdown
Contributor

@sureshanaparti sureshanaparti commented Jun 4, 2026

Description

This PR optimizes the DB updates to use bulk UPDATE instead of row-level locks, here:

  • Reset Hosts (HostDaoImpl.resetHosts)
    Eliminates N SELECT FOR UPDATE + N individual UPDATE round-trips during MS startup/reconnect.
    The lock was safe to remove because resetHosts only modifies rows where management_server_id = MS (no other MS targets these rows), and setting to NULL is idempotent with no read-dependent computation.
  • Alert archival (AlertDaoImpl.archiveAlert)
    Reuses the existing SearchCriteria to issue one UPDATE alert SET archived=1 WHERE ... instead of N individual lock-update-commit cycles.
  • Event archival (EventDaoImpl.archiveEvents)
    Builds an ID-based SearchCriteria from the pre-fetched event list and issues a single bulk UPDATE.
  • SecurityGroupWorkDaoImpl.updateStep
    updateStep(workId, step): replaces lockRow + null check + update with a single UPDATE ... SET step=? WHERE id=?. Non-existent row results in 0 rows affected — same no-op as the original null check.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

sliceofapplepie and others added 5 commits June 5, 2026 00:47
…el locks

Replace lockRows + per-row update loop with a single bulk UPDATE using
the existing createForUpdate/UpdateBuilder/update(ub, sc, null) pattern
(same as markHostsAsDisconnected in the same file). Eliminates N SELECT
FOR UPDATE + N individual UPDATE round-trips during MS startup/reconnect.

The lock was safe to remove because resetHosts only modifies rows where
management_server_id = ourMS (no other MS targets these rows), and
setting to NULL is idempotent with no read-dependent computation.

A non-locking SELECT is issued before the bulk UPDATE when TRACE logging
is enabled to preserve per-host-ID logging from the original code.
…evel locks

Replace per-row lockRow + update + commit loop with a single bulk UPDATE
using createForUpdate/UpdateBuilder/update(ub, sc, null) pattern.

AlertDaoImpl.archiveAlert: reuses the existing SearchCriteria to issue
one UPDATE alert SET archived=1 WHERE ... instead of N individual
lock-update-commit cycles.

EventDaoImpl.archiveEvents: builds an ID-based SearchCriteria from the
pre-fetched event list and issues a single bulk UPDATE.

Also fixes a latent NPE in both methods where lockRow returning null
(row deleted concurrently) would cause the next line to throw.
…tead of row-level locks

Replace lockRow/lockRows + update pattern in both updateStep overloads
with createForUpdate + update(entity, sc) — the same pattern already
used in findAndCleanupUnfinishedWork in the same file.

updateStep(vmId, seqNum, step): replaces lockRows(LIMIT 1) + update
with a single UPDATE ... SET step=? WHERE instance_id=? AND seq_no=?.

updateStep(workId, step): replaces lockRow + null check + update with
a single UPDATE ... SET step=? WHERE id=?. Non-existent row results
in 0 rows affected — same no-op as the original null check.

The locks were redundant because the work queue model guarantees
single-owner access: take() assigns a serverId to each work item,
and only the owning server updates its step.
@sureshanaparti
Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2026

Codecov Report

❌ Patch coverage is 0% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 18.10%. Comparing base (be51948) to head (e654122).

Files with missing lines Patch % Lines
...rc/main/java/com/cloud/event/dao/EventDaoImpl.java 0.00% 12 Missing ⚠️
.../src/main/java/com/cloud/host/dao/HostDaoImpl.java 0.00% 12 Missing ⚠️
...rc/main/java/com/cloud/alert/dao/AlertDaoImpl.java 0.00% 8 Missing ⚠️
...network/security/dao/SecurityGroupWorkDaoImpl.java 0.00% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #13349      +/-   ##
============================================
- Coverage     18.10%   18.10%   -0.01%     
  Complexity    16750    16750              
============================================
  Files          6037     6037              
  Lines        542798   542790       -8     
  Branches      66457    66455       -2     
============================================
- Hits          98298    98290       -8     
- Misses       433453   433456       +3     
+ Partials      11047    11044       -3     
Flag Coverage Δ
uitests 3.51% <ø> (ø)
unittests 19.27% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18155

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants