Skip to content

[feat] Support multi-cluster operation in Slurm backends#3639

Merged
vkarak merged 2 commits intoreframe-hpc:developfrom
vkarak:feat/slurm-multi-cluster
Mar 30, 2026
Merged

[feat] Support multi-cluster operation in Slurm backends#3639
vkarak merged 2 commits intoreframe-hpc:developfrom
vkarak:feat/slurm-multi-cluster

Conversation

@vkarak
Copy link
Copy Markdown
Contributor

@vkarak vkarak commented Mar 9, 2026

This PR introduces a new configuration option for Slurm backends named slurm_multi_cluster_mode that supports Slurm's Multi-Cluster Operation. If not specified, nothing changes. If it is, then the clusters listed are being passed to Slurm's -M option. If set to ["all"], this is equivalent to -M all and all clusters are queried.

Closes #3559.

@JimPaine Would you mind trying this PR with your setup?

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 10, 2026

Codecov Report

❌ Patch coverage is 92.30769% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 91.70%. Comparing base (1eee4f9) to head (dbfc43f).
⚠️ Report is 3 commits behind head on develop.

Files with missing lines Patch % Lines
reframe/core/schedulers/slurm.py 91.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3639      +/-   ##
===========================================
- Coverage    91.70%   91.70%   -0.01%     
===========================================
  Files           62       62              
  Lines        13713    13724      +11     
===========================================
+ Hits         12576    12586      +10     
- Misses        1137     1138       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JimPaine
Copy link
Copy Markdown
Contributor

JimPaine commented Mar 10, 2026

@vkarak I have pulled from your fork and can confirm it is polling the the correct cluster.

Something that I think could improve the user experience would be to include it against the sbatch command as well. Currently I need to set the cluster twice, once for submission and once for job polling.

Here is a snippet of my partitions for the test I ran, you can see that I currently need to set it in access and slurm_multi_cluster_mode to be able to run the test.

                {
                    'name': 'cluster1',
                    'scheduler': 'slurm',
                    'launcher': 'local',
                    'environs': ['slurm_multi_cluster_mode'],
                    'access': ['-M tst1'],
                    'sched_options': {
                        'slurm_multi_cluster_mode': ['cluster1']
                    }
                },
                {
                    'name': 'cluster2',
                    'scheduler': 'slurm',
                    'launcher': 'local',
                    'environs': ['slurm_multi_cluster_mode'],
                    'access': ['-M tst2'],
                    'sched_options': {
                        'slurm_multi_cluster_mode': ['cluster2']
                    }
                }

@vkarak
Copy link
Copy Markdown
Contributor Author

vkarak commented Mar 10, 2026

Something that I think could improve the user experience would be to include it against the sbatch command as well. Currently I need to set the cluster twice, once for submission and once for job polling.

Yes, that make sense! I'll update the PR, so that the access options take multi-cluster mode into account.

@vkarak vkarak force-pushed the feat/slurm-multi-cluster branch from c30ab53 to dbfc43f Compare March 25, 2026 01:04
@vkarak
Copy link
Copy Markdown
Contributor Author

vkarak commented Mar 25, 2026

Yes, that make sense! I'll update the PR, so that the access options take multi-cluster mode into account.

I just updated it; now there is no need to pass the -M option explicitly. Let me know if that works fine for you, so that we can merge this.

Copy link
Copy Markdown
Contributor

@gppezzi gppezzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works on Alps Daint.

@github-project-automation github-project-automation Bot moved this from Todo to In Progress in ReFrame Backlog Mar 26, 2026
@vkarak vkarak merged commit d740817 into reframe-hpc:develop Mar 30, 2026
31 of 32 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in ReFrame Backlog Mar 30, 2026
@vkarak vkarak deleted the feat/slurm-multi-cluster branch March 30, 2026 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Slurm Scheduler doesn't support multi-cluster

3 participants