Skip to content

Unable to load scheduler dashboard in SLURMRunner, but can in cluster #682

@gilmorethomas

Description

@gilmorethomas

Describe the issue:
Thanks for your time in advance. I have created a simple "hello world" example of a SLURMRunner and SLURMCluster in my environment. I like the interface for the SLURMRunner instead of effectively needing to create wrappers around jobs in the SLURMCluster construct.

I dispatch my SLURMCluster job via sbatch (since my login node cannot run my scheduler) to a worker node (node-01), and then this dispatches additional jobs on my worker nodes (node[01-06]). When I do this, I am able to visit the scheduler dashboard, although I am seeing slightly weird behavior in job allocation (not the point of this post, I need to look into this more).

When I create my SLURMRunner (same as this example https://jobqueue.dask.org/en/stable/runners-overview.html), my jobs are getting allocated and run, but I am unable to load the scheduler dashboard. I get a 404 Page Not Found when I visit the scheduler link output by the client.dashboard_link and also in the scheduler.json file. This is not the same as when the runner spins down, as in this case I get the Connection Refused. Is this expected?

Minimal Complete Verifiable Example:
Using the SLURMRunner in my multi-node environment

# Put your MCVE code here

Anything else we need to know?:

Environment:

  • Dask version: 2023.6.0
  • Python version: 3.11.4
  • Operating System: CENTOS-7
  • Install method: pip

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs infoIf more info has been requested from the author, apply this label.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions