You need to run, say, 30 srun jobs, but ensure each of the jobs is run on a node from the particular list of nodes (that have the same performance, to fairly compare timings). How would you do it?
What I tried:
srun --nodelist=machineN[0-3] <some_cmd>
: runs <some_cmd>
on all the nodes simultaneously (what i need: to run <some_cmd>
on one of the available nodes from the list)
srun -p partition
seems to work, but needs a partition that contains exactly machineN[0-3], which is not always the case.
Ideas?
You can go the opposite direction and use the --exclude
option of sbatch
:
srun --exclude=machineN[4-XX] <some_cmd>
Then slurm will only consider nodes that are not listed in the excluded list. If the list is long and complicated, it can be saved in a file.
Another option is to check whether the Slurm configuration includes ''features'' with
sinfo --format "%20N %20f"
If the 'features' column shows a comma-delimited list of features each node has (might be CPU family, network connection type, etc.), you can select a subset of the nodes with a specific features using
srun --constraint=<some_feature> <some_cmd>