How to pass an array from C to an embedded python script

user1750948 picture user1750948 · Dec 18, 2012 · Viewed 14k times · Source

I am running to some problems and would like some help. I have a piece code, which is used to embed a python script. This python script contains a function which will expect to receive an array as an argument (in this case I am using numpy array within the python script). I would like to know how can I pass an array from C to the embedded python script as an argument for the function within the script. More specifically can someone show me a simple example of this.

Answer

abarnert picture abarnert · Dec 18, 2012

Really, the best answer here is probably to use numpy arrays exclusively, even from your C code. But if that's not possible, then you have the same problem as any code that shares data between C types and Python types.

In general, there are at least five options for sharing data between C and Python:

  1. Create a Python list or other object to pass.
  2. Define a new Python type (in your C code) to wrap and represent the array, with the same methods you'd define for a sequence object in Python (__getitem__, etc.).
  3. Cast the pointer to the array to intptr_t, or to explicit ctypes type, or just leave it un-cast; then use ctypes on the Python side to access it.
  4. Cast the pointer to the array to const char * and pass it as a str (or, in Py3, bytes), and use struct or ctypes on the Python side to access it.
  5. Create an object matching the buffer protocol, and again use struct or ctypes on the Python side.

In your case, you want to use numpy.arrays in Python. So, the general cases become:

  1. Create a numpy.array to pass.
  2. (probably not appropriate)
  3. Pass the pointer to the array as-is, and from Python, use ctypes to get it into a type that numpy can convert into an array.
  4. Cast the pointer to the array to const char * and pass it as a str (or, in Py3, bytes), which is already a type that numpy can convert into an array.
  5. Create an object matching the buffer protocol, and which again I believe numpy can convert directly.

For 1, here's how to do it with a list, just because it's a very simple example (and I already wrote it…):

PyObject *makelist(int array[], size_t size) {
    PyObject *l = PyList_New(size);
    for (size_t i = 0; i != size; ++i) {
        PyList_SET_ITEM(l, i, PyInt_FromLong(array[i]));
    }
    return l;
}

And here's the numpy.array equivalent (assuming you can rely on the C array not to be deleted—see Creating arrays in the docs for more details on your options here):

PyObject *makearray(int array[], size_t size) {
    npy_int dim = size;
    return PyArray_SimpleNewFromData(1, &dim, (void *)array);
}

At any rate, however you do this, you will end up with something that looks like a PyObject * from C (and has a single refcount), so you can pass it as a function argument, while on the Python side it will look like a numpy.array, list, bytes, or whatever else is appropriate.

Now, how do you actually pass function arguments? Well, the sample code in Pure Embedding that you referenced in your comment shows how to do this, but doesn't really explain what's going on. There's actually more explanation in the extending docs than the embedding docs, specifically, Calling Python Functions from C. Also, keep in mind that the standard library source code is chock full of examples of this (although some of them aren't as readable as they could be, either because of optimization, or just because they haven't been updated to take advantage of new simplified C API features).

Skip the first example about getting a Python function from Python, because presumably you already have that. The second example (and the paragraph right about it) shows the easy way to do it: Creating an argument tuple with Py_BuildValue. So, let's say we want to call a function you've got stored in myfunc with the list mylist returned by that makelist function above. Here's what you do:

if (!PyCallable_Check(myfunc)) {
    PyErr_SetString(PyExc_TypeError, "function is not callable?!");
    return NULL;
}
PyObject *arglist = Py_BuildValue("(o)", mylist);
PyObject *result = PyObject_CallObject(myfunc, arglist);
Py_DECREF(arglist);
return result;

You can skip the callable check if you're sure you've got a valid callable object, of course. (And it's usually better to check when you first get myfunc, if appropriate, because you can give both earlier and better error feedback that way.)

If you want to actually understand what's going on, try it without Py_BuildValue. As the docs say, the second argument to [PyObject_CallObject][6] is a tuple, and PyObject_CallObject(callable_object, args) is equivalent to apply(callable_object, args), which is equivalent to callable_object(*args). So, if you wanted to call myfunc(mylist) in Python, you have to turn that into, effectively, myfunc(*(mylist,)) so you can translate it to C. You can construct a tuple like this:

PyObject *arglist = PyTuple_Pack(1, mylist);

But usually, Py_BuildValue is easier (especially if you haven't already packed everything up as Python objects), and the intention in your code is clearer (just as using PyArg_ParseTuple is simpler and clearer than using explicit tuple functions in the other direction).

So, how do you get that myfunc? Well, if you've created the function from the embedding code, just keep the pointer around. If you want it passed in from the Python code, that's exactly what the first example does. If you want to, e.g., look it up by name from a module or other context, the APIs for concrete types like PyModule and abstract types like PyMapping are pretty simple, and it's generally obvious how to convert Python code into the equivalent C code, even if the result is mostly ugly boilerplate.

Putting it all together, let's say I've got a C array of integers, and I want to import mymodule and call a function mymodule.myfunc(mylist) that returns an int. Here's a stripped-down example (not actually tested, and no error handling, but it should show all the parts):

int callModuleFunc(int array[], size_t size) {
    PyObject *mymodule = PyImport_ImportModule("mymodule");
    PyObject *myfunc = PyObject_GetAttrString(mymodule, "myfunc");
    PyObject *mylist = PyList_New(size);
    for (size_t i = 0; i != size; ++i) {
        PyList_SET_ITEM(l, i, PyInt_FromLong(array[i]));
    }
    PyObject *arglist = Py_BuildValue("(o)", mylist);
    PyObject *result = PyObject_CallObject(myfunc, arglist);
    int retval = (int)PyInt_AsLong(result);
    Py_DECREF(result);
    Py_DECREF(arglist);
    Py_DECREF(mylist);
    Py_DECREF(myfunc);
    Py_DECREF(mymodule);
    return retval;
}

If you're using C++, you probably want to look into some kind of scope-guard/janitor/etc. to handle all those Py_DECREF calls, especially once you start doing proper error handling (which usually means early return NULL calls peppered through the function). If you're using C++11 or Boost, unique_ptr<PyObject, Py_DecRef> may be all you need.

But really, a better way to reduce all that ugly boilerplate, if you plan to do a lot of C<->Python communication, is to look at all of the familiar frameworks designed for improving extending Python—Cython, boost::python, etc. Even though you're embedding, you're effectively doing the same work as extending, so they can help in the same ways.

For that matter, some of them also have tools to help the embedding part, if you search around the docs. For example, you can write your main program in Cython, using both C code and Python code, and cython --embed. You may want to cross your fingers and/or sacrifice some chickens, but if it works, it's amazingly simple and productive. Boost isn't nearly as trivial to get started, but once you've got things together, almost everything is done in exactly the way you'd expect, and just works, and that's just as true for embedding as extending. And so on.