I have code that takes a long time to run and so I've been investigating Python's multiprocessing library in order to speed things up. My code also has a few steps that utilize the GPU via PyOpenCL. The problem is, if I set multiple processes to run at the same time, they all end up trying to use the GPU at the same time, and that often results in one or more of the processes throwing an exception and quitting.
In order to work around this, I staggered the start of each process so that they'd be less likely to bump into each other:
process_list = []
num_procs = 4
# break data into chunks so each process gets it's own chunk of the data
data_chunks = chunks(data,num_procs)
for chunk in data_chunks:
if len(chunk) == 0:
continue
# Instantiates the process
p = multiprocessing.Process(target=test, args=(arg1,arg2))
# Sticks the thread in a list so that it remains accessible
process_list.append(p)
# Start threads
j = 1
for process in process_list:
print('\nStarting process %i' % j)
process.start()
time.sleep(5)
j += 1
for process in process_list:
process.join()
I also wrapped a try except loop around the function that calls the GPU so that if two processes DO try to access it at the same time, the one who doesn't get access will wait a couple of seconds and try again:
wait = 2
n = 0
while True:
try:
gpu_out = GPU_Obj.GPU_fn(params)
except:
time.sleep(wait)
print('\n Waiting for GPU memory...')
n += 1
if n == 5:
raise Exception('Tried and failed %i times to allocate memory for opencl kernel.' % n)
continue
break
This workaround is very clunky and even though it works most of the time, processes occasionally throw exceptions and I feel like there should be a more effecient/elegant solution using multiprocessing.queue
or something similar. However, I'm not sure how to integrate it with PyOpenCL for GPU access.
Sounds like you could use a multiprocessing.Lock
to synchronize access to the GPU:
data_chunks = chunks(data,num_procs)
lock = multiprocessing.Lock()
for chunk in data_chunks:
if len(chunk) == 0:
continue
# Instantiates the process
p = multiprocessing.Process(target=test, args=(arg1,arg2, lock))
...
Then, inside test
where you access the GPU:
with lock: # Only one process will be allowed in this block at a time.
gpu_out = GPU_Obj.GPU_fn(params)
Edit:
To do this with a pool, you'd do this:
# At global scope
lock = None
def init(_lock):
global lock
lock = _lock
data_chunks = chunks(data,num_procs)
lock = multiprocessing.Lock()
for chunk in data_chunks:
if len(chunk) == 0:
continue
# Instantiates the process
p = multiprocessing.Pool(initializer=init, initargs=(lock,))
p.apply(test, args=(arg1, arg2))
...
Or:
data_chunks = chunks(data,num_procs)
m = multiprocessing.Manager()
lock = m.Lock()
for chunk in data_chunks:
if len(chunk) == 0:
continue
# Instantiates the process
p = multiprocessing.Pool()
p.apply(test, args=(arg1, arg2, lock))