Skip to content

Bump scheduled exports count only in case it has been scheduled#1499

Open
arthurpassos wants to merge 3 commits intoantalya-26.1from
export_partition_increment_scheduled_after_scheduling
Open

Bump scheduled exports count only in case it has been scheduled#1499
arthurpassos wants to merge 3 commits intoantalya-26.1from
export_partition_increment_scheduled_after_scheduling

Conversation

@arthurpassos
Copy link
Collaborator

@arthurpassos arthurpassos commented Mar 10, 2026

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fixes the below by incrementing the scheduled counter only in case it has successfully scheduled the task

Medium: Scheduler slot counter increments after failed schedule attempt
Impact: Available background slots can be underutilized in the same scheduling cycle, delaying export throughput.

Partially fix #1498

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@arthurpassos
Copy link
Collaborator Author

cc @Selfeer

@arthurpassos arthurpassos added port-antalya PRs to be ported to all new Antalya releases antalya-26.1 labels Mar 10, 2026
@github-actions
Copy link

github-actions bot commented Mar 10, 2026

Workflow [PR], commit [deac124]

@arthurpassos
Copy link
Collaborator Author

#1498

@Selfeer
Copy link
Collaborator

Selfeer commented Mar 16, 2026

PR #1499 Audit Review

AI audit note: This review was generated by AI (gpt-5.3-codex).

Audit update for PR #1499 (scheduled export counter update placement in partition export scheduler)


Confirmed defects

Medium: Scheduler capacity counter no longer increments in lock_inside_the_task path

  • Impact: When export_merge_tree_partition_lock_inside_the_task=1, the per-run cap (available_move_executors) is no longer enforced by scheduled_exports_count; the scheduler keeps attempting to enqueue until executor rejection, then exits early via return, reducing scheduling fairness and causing avoidable failed scheduling attempts under load.
  • Anchor: src/Storages/MergeTree/ExportPartitionTaskScheduler.cpp / ExportPartitionTaskScheduler::run (if (manifest.lock_inside_the_task) branch).
  • Trigger: Enable export_merge_tree_partition_lock_inside_the_task, have at least one pending partition export with multiple parts, and run scheduler with finite move slots.
  • Why defect: PR moves scheduled_exports_count++ into the else (!lock_inside_the_task) path and removes the shared increment. Successful scheduling in the lock_inside_the_task branch no longer updates the counter used by both loop guards, so the intended scheduler-side capacity accounting is broken for that mode.

Code evidence — loop guards using the counter:

// Outer loop (by task entry)
if (scheduled_exports_count >= available_move_executors)
{
    LOG_INFO(storage.log, "ExportPartition scheduler task: Scheduled exports count is greater than available move executors, skipping");
    break;
}
// ...
// Inner loop (by part)
for (const auto & zk_part_name : parts_in_processing_or_pending)
{
    if (scheduled_exports_count >= available_move_executors)
    {
        LOG_INFO(storage.log, "ExportPartition scheduler task: Scheduled exports count is greater than available move executors, skipping");
        break;
    }

Code evidence — lock_inside_the_task branch: success path never increments counter:

if (manifest.lock_inside_the_task)
{
    // ...
    if (!storage.background_moves_assignee.scheduleMoveTask(part_export_manifest.task))
    {
        storage.export_manifests.erase(part_export_manifest);
        LOG_INFO(storage.log, "ExportPartition scheduler task: Failed to schedule export part task, skipping");
        return;
    }
    // ← no scheduled_exports_count++ here
}
else
{
    try
    {
        // ... zk lock + exportPartToTable(...)
    }
    catch (const Exception &) { /* ... */ }
}

scheduled_exports_count++;  // only reached after the else block; lock_inside_the_task success skips this

So when lock_inside_the_task is true and scheduleMoveTask succeeds, the loop continues with an unchanged scheduled_exports_count, and the guards above never stop further scheduling until the executor rejects.

  • Smallest logical reproduction steps: (1) Set export_merge_tree_partition_lock_inside_the_task=1; (2) create an export with multiple pending parts and limited move executor slots; (3) run scheduler and observe continued scheduling attempts past the configured per-run cap until scheduleMoveTask fails and returns from run.
  • Fix direction (short): Increment scheduled_exports_count immediately after successful scheduleMoveTask(...) in the lock_inside_the_task branch (or unify increment on both success paths).
  • Regression test direction (short): Add scheduler unit/integration coverage for both lock modes verifying counter-based cap behavior and absence of scheduleMoveTask rejection when exactly filling available_move_executors.
  • Affected subsystem and blast radius: Replicated MergeTree partition-export scheduling; impacts clusters using partition export with lock_inside_the_task enabled.

Coverage summary

  • Scope reviewed: PR delta in src/Storages/MergeTree/ExportPartitionTaskScheduler.cpp plus connected scheduler/executor interfaces (BackgroundJobsAssignee::getAvailableMoveExecutors, BackgroundJobsAssignee::scheduleMoveTask) and setting propagation for lock_inside_the_task.
  • Categories failed: Scheduler capacity/accounting invariant across branch variants (lock_inside_the_task vs direct export path).
  • Categories passed: Entry/dispatch chain correctness; ZooKeeper status/part selection gating; non-lock_inside_the_task branch success/failure accounting intent; concurrency-focused scan for obvious new races/deadlocks from this delta; C++ bug-class checks (lifetime, iterator invalidation, integer/signedness, exception-safety, RAII, UB) for changed path.
  • Assumptions/limits: Static audit only (no runtime fault execution); findings are limited to PR Bump scheduled exports count only in case it has been scheduled #1499 diff and directly connected call paths.

@arthurpassos
Copy link
Collaborator Author

That is true, and shame on me for forgetting that :)

mkmkme
mkmkme previously approved these changes Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya-26.1 port-antalya PRs to be ported to all new Antalya releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants