I am trying to run some parallel code on slurm, where the different processes do not need to communicate. Naively I used python's slurm package. However, it seems that I am only using the cpu's on one node.
For example, if I have 4 nodes with 5 cpu's each, I will only run 5 processes at the same time. How can I tell multiprocessing to run on different nodes?
The python code looks like the following
import multiprocessing
def hello():
print("Hello World")
pool = multiprocessing.Pool()
jobs = []
for j in range(len(10)):
p = multiprocessing.Process(target = run_rel)
jobs.append(p)
p.start()
The problem is similar to this one, but there it has not been solved in detail.
Your current code will run 10 times on 5 processor, on a SINGLE node where you start it. It has nothing to do with SLURM now.
You will have to SBATCH
the script to SLURM.
If you want to run this script on 5 cores with SLURM modify the script like this:
#!/usr/bin/python3
#SBATCH --output=wherever_you_want_to_store_the_output.log
#SBATCH --partition=whatever_the_name_of_your_SLURM_partition_is
#SBATCH -n 5 # 5 cores
import sys
import os
import multiprocessing
# Necessary to add cwd to path when script run
# by SLURM (since it executes a copy)
sys.path.append(os.getcwd())
def hello():
print("Hello World")
pool = multiprocessing.Pool()
jobs = []
for j in range(len(10)):
p = multiprocessing.Process(target = run_rel)
jobs.append(p)
p.start()
And then execute the script with
sbatch my_python_script.py
On one of the nodes where SLURM is installed
However this will allocate your job to a SINGLE node as well, so the speed will be the very same as you would just run it on a single node.
I dont know why would you want to run it on different nodes when you have just 5 processes. It will be faster just to run on one node. If you allocate more then 5 cores, in the beginning of the python script, then SLURM will allocate more nodes for you.