How to use Python multiprocessing queue to access GPU (through PyOpenCL)?

johnny_be picture johnny_be · Apr 13, 2015 · Viewed 13.7k times · Source

I have code that takes a long time to run and so I've been investigating Python's multiprocessing library in order to speed things up. My code also has a few steps that utilize the GPU via PyOpenCL. The problem is, if I set multiple processes to run at the same time, they all end up trying to use the GPU at the same time, and that often results in one or more of the processes throwing an exception and quitting.

In order to work around this, I staggered the start of each process so that they'd be less likely to bump into each other:

process_list = []
num_procs = 4

# break data into chunks so each process gets it's own chunk of the data
data_chunks = chunks(data,num_procs)
for chunk in data_chunks:
    if len(chunk) == 0:
        continue
    # Instantiates the process
    p = multiprocessing.Process(target=test, args=(arg1,arg2))
    # Sticks the thread in a list so that it remains accessible
    process_list.append(p)

# Start threads
j = 1
for process in process_list:
    print('\nStarting process %i' % j)
    process.start()
    time.sleep(5)
    j += 1

for process in process_list:
    process.join()

I also wrapped a try except loop around the function that calls the GPU so that if two processes DO try to access it at the same time, the one who doesn't get access will wait a couple of seconds and try again:

wait = 2
n = 0
while True:
    try:
        gpu_out = GPU_Obj.GPU_fn(params)
    except:
        time.sleep(wait)
        print('\n Waiting for GPU memory...')
        n += 1
        if n == 5:
            raise Exception('Tried and failed %i times to allocate memory for opencl kernel.' % n)
        continue
    break

This workaround is very clunky and even though it works most of the time, processes occasionally throw exceptions and I feel like there should be a more effecient/elegant solution using multiprocessing.queue or something similar. However, I'm not sure how to integrate it with PyOpenCL for GPU access.

Answer

dano picture dano · Apr 13, 2015

Sounds like you could use a multiprocessing.Lock to synchronize access to the GPU:

data_chunks = chunks(data,num_procs)
lock = multiprocessing.Lock()
for chunk in data_chunks:
    if len(chunk) == 0:
        continue
    # Instantiates the process
    p = multiprocessing.Process(target=test, args=(arg1,arg2, lock))
    ...

Then, inside test where you access the GPU:

with lock:  # Only one process will be allowed in this block at a time.
    gpu_out = GPU_Obj.GPU_fn(params)

Edit:

To do this with a pool, you'd do this:

# At global scope
lock = None

def init(_lock):
    global lock
    lock = _lock

data_chunks = chunks(data,num_procs)
lock = multiprocessing.Lock()
for chunk in data_chunks:
    if len(chunk) == 0:
        continue
    # Instantiates the process
    p = multiprocessing.Pool(initializer=init, initargs=(lock,))
    p.apply(test, args=(arg1, arg2))
    ...

Or:

data_chunks = chunks(data,num_procs)
m = multiprocessing.Manager()
lock = m.Lock()
for chunk in data_chunks:
    if len(chunk) == 0:
        continue
    # Instantiates the process
    p = multiprocessing.Pool()
    p.apply(test, args=(arg1, arg2, lock))