SLURM: How to run 30 jobs on particular nodes only?

Ayrat picture Ayrat · May 27, 2016 · Viewed 10.1k times · Source

You need to run, say, 30 srun jobs, but ensure each of the jobs is run on a node from the particular list of nodes (that have the same performance, to fairly compare timings). How would you do it?

What I tried:

  • srun --nodelist=machineN[0-3] <some_cmd> : runs <some_cmd> on all the nodes simultaneously (what i need: to run <some_cmd> on one of the available nodes from the list)

  • srun -p partition seems to work, but needs a partition that contains exactly machineN[0-3], which is not always the case.

Ideas?

Answer

damienfrancois picture damienfrancois · May 31, 2016

You can go the opposite direction and use the --exclude option of sbatch:

srun --exclude=machineN[4-XX] <some_cmd>

Then slurm will only consider nodes that are not listed in the excluded list. If the list is long and complicated, it can be saved in a file.

Another option is to check whether the Slurm configuration includes ''features'' with

sinfo  --format "%20N %20f"

If the 'features' column shows a comma-delimited list of features each node has (might be CPU family, network connection type, etc.), you can select a subset of the nodes with a specific features using

srun --constraint=<some_feature> <some_cmd>