Using a dictionary in Cython , especially inside nogil

Seja Nair picture Seja Nair · Aug 28, 2015 · Viewed 18.1k times · Source

I am having a dictionary,

my_dict = {'a':[1,2,3], 'b':[4,5] , 'c':[7,1,2])

I want to use this dictionary inside a Cython nogil function . So , I tried to declare it as

cdef dict cy_dict = my_dict 

Up to this stage is fine.

Now I need to iterate over the keys of my_dict and if the values are in list, iterate over it. In Python , it is quite easy as follows:

 for key in my_dict:
      if isinstance(my_dict[key], (list, tuple)):
          ###### Iterate over the value of the list or tuple
          for value in list:
               ## Do some over operation.

But, inside Cython, I want to implement the same that too inside nogil . As, python objects are not allowed inside nogil, I am all stuck up here.

with nogil:
    #### same implementation of the same in Cython

Can anyone please help me out ?

Answer

DavidW picture DavidW · Aug 28, 2015

You can't use Python dict without the GIL because everything you could do with it involves manipulating Python objects. You most sensible option is to accept that you need the GIL. There's a less sensible option too involving C++ maps, but it may be hard to apply for your specific case.

You can use with gil: to reacquire the GIL. There is obvious an overhead here (parts using the GIL can't be executed in parallel, and there may be a delay which it waits for the GIL). However, if the dictionary manipulation is a small chunk of a larger piece of Cython code this may not be too bad:

with nogil:
  # some large chunk of computationally intensive code goes here
  with gil:
    # your dictionary code
  # more computationally intensive stuff here

The other less sensible option is to use C++ maps (along side other C++ standard library data types). Cython can wrap these and automatically convert them. To give a trivial example based on your example data:

from libcpp.map cimport map
from libcpp.string cimport string
from libcpp.vector cimport vector
from cython.operator cimport dereference, preincrement

def f():
    my_dict = {'a':[1,2,3], 'b':[4,5] , 'c':[7,1,2]}
    # the following conversion has an computational cost to it 
    # and must be done with the GIL. Depending on your design
    # you might be able to ensure it's only done once so that the
    # cost doesn't matter much
    cdef map[string,vector[int]] m = my_dict
    
    # cdef statements can't go inside no gil, but much of the work can
    cdef map[string,vector[int]].iterator end = m.end()
    cdef map[string,vector[int]].iterator it = m.begin()
        
    cdef int total_length = 0
    
    with nogil: # all  this stuff can now go inside nogil   
        while it != end:
            total_length += dereference(it).second.size()
            preincrement(it)
        
    print total_length

(you need to compile this with language='c++').

The obvious disadvantage to this is that the data-types inside the dict must be known in advance (it can't be an arbitrary Python object). However, since you can't manipulate arbitrary Python objects inside a nogil block you're pretty restricted anyway.

6-year later addendum: I don't recommend the "use C++ objects everywhere" approach as a general approach. The Cython-C++ interface is a bit clunky and you can spend a lot of time working around it. The Python containers are actually better than you think. Everyone tends to forget about the cost of converting their C++ objects to/from Python objects. People rarely consider if they really need to release the GIL or if they just read an article on the internet somewhere saying that the GIL is bad..

It's good for some tasks, but think carefully before blindly replacing all your list with vector, dict with map etc.. As a rule, if your C++ types live entirely within your function it may be a good move (but think twice...). If they're being converted as input or output arguments then think a third time.