How to submit a job to any [subset] of nodes from nodelist in SLURM?

Faber picture Faber · Oct 6, 2014 · Viewed 20.4k times · Source

I have a couple of thousand jobs to run on a SLURM cluster with 16 nodes. These jobs should run only on a subset of the available nodes of size 7. Some of the tasks are parallelized, hence use all the CPU power of a single node while others are single threaded. Therefore, multiple jobs should run at the same time on a single node. None of the tasks should spawn over multiple nodes.

Currently I submit each of the jobs as follow:

sbatch --nodelist=myCluster[10-16] myScript.sh

However this parameter makes slurm to wait till the submitted job terminates, and hence leaves 3 nodes completely unused and, depending on the task (multi- or single-threaded), also the currently active node might be under low load in terms of CPU capability.

What are the best parameters of sbatch that force slurm to run multiple jobs at the same time on the specified nodes?

Answer

damienfrancois picture damienfrancois · Oct 8, 2014

You can work the other way around; rather than specifying which nodes to use, with the effect that each job is allocated all the 7 nodes, specify which nodes not to use:

sbatch --exclude=myCluster[01-09] myScript.sh

and Slurm will never allocate more than 7 nodes to your jobs. Make sure though that the cluster configuration allows node sharing, and that your myScript.sh contains #SBATCH --ntasks=1 --cpu-per-task=n with n the number of threads of each job.