C array to PyArray

Daniele Bigoni picture Daniele Bigoni · Sep 25, 2012 · Viewed 10.8k times · Source

I'm writing a Python C-Extension without using Cython.

I want to allocate a double array in C, use it in an internal function (that happens to be in Fortran) and return it. I point out that the C-Fortran interface works perfectly in C.

static PyObject *
Py_drecur(PyObject *self, PyObject *args)
{
  // INPUT
  int n;
  int ipoly;
  double al;
  double be;

  if (!PyArg_ParseTuple(args, "iidd", &n, &ipoly, &al, &be))
    return NULL;

  // OUTPUT
  int nd = 1;
  npy_intp dims[] = {n};
  double a[n];
  double b[n];
  int ierr;

  drecur_(n, ipoly, al, be, a, b, ierr);

  // Create PyArray
  PyObject* alpha = PyArray_SimpleNewFromData(nd, dims, NPY_DOUBLE, a);
  PyObject* beta = PyArray_SimpleNewFromData(nd, dims, NPY_DOUBLE, b);

  Py_INCREF(alpha);
  Py_INCREF(beta);

  return Py_BuildValue("OO", alpha, beta);
}

I debugged this code and I get a Segmentation fault when I try to create alpha out of a. Up to there everything works fine. The function drecur_ works and I get the same problem if it is removed.

Now, what's the standard way of defining a PyArray around C data? I found documentation but no good example. Also, what about memory leakage? Is it correct to INCREF before return so that the instance of alpha and beta are preserved? What about the deallocation when they are not needed anymore?

EDIT I finally got it right with the approach found in NumPy cookbook.

static PyObject *
Py_drecur(PyObject *self, PyObject *args)
{
  // INPUT
  int n;
  int ipoly;
  double al;
  double be;
  double *a, *b;
  PyArrayObject *alpha, *beta;

  if (!PyArg_ParseTuple(args, "iidd", &n, &ipoly, &al, &be))
    return NULL;

  // OUTPUT
  int nd = 1;
  int dims[2];
  dims[0] = n;
  alpha = (PyArrayObject*) PyArray_FromDims(nd, dims, NPY_DOUBLE);
  beta = (PyArrayObject*) PyArray_FromDims(nd, dims, NPY_DOUBLE);
  a = pyvector_to_Carrayptrs(alpha);
  b = pyvector_to_Carrayptrs(beta);
  int ierr;

  drecur_(n, ipoly, al, be, a, b, ierr);

  return Py_BuildValue("OO", alpha, beta);
}

double *pyvector_to_Carrayptrs(PyArrayObject *arrayin)  {
  int n=arrayin->dimensions[0];
  return (double *) arrayin->data;  /* pointer to arrayin data as double */
}

Feel free to comment on this and thanks for the answers.

Answer

Raphael Ahrens picture Raphael Ahrens · Sep 25, 2012

So the first things that looks suspicious, is that your array a and b are in the local scope of the function. That means after the return you will get an illegal memory access.

So you should declare the arrays with

double *a = malloc(n*sizeof(double));

Then you need to make sure that the memory is later freed by the object you have created. See this quote of the documentation:

PyObject PyArray_SimpleNewFromData(int nd, npy_intp dims, int typenum, void* data)

Sometimes, you want to wrap memory allocated elsewhere into an ndarray object for downstream use. This routine makes it straightforward to do that. The first three arguments are the same as in PyArray_SimpleNew, the final argument is a pointer to a block of contiguous memory that the ndarray should use as it’s data-buffer which will be interpreted in C-style contiguous fashion. A new reference to an ndarray is returned, but the ndarray will not own its data. When this ndarray is deallocated, the pointer will not be freed.

You should ensure that the provided memory is not freed while the returned array is in existence. The easiest way to handle this is if data comes from another reference-counted Python object. The reference count on this object should be increased after the pointer is passed in, and the base member of the returned ndarray should point to the Python object that owns the data. Then, when the ndarray is deallocated, the base-member will be DECREF’d appropriately. If you want the memory to be freed as soon as the ndarray is deallocated then simply set the OWNDATA flag on the returned ndarray.

For your second question the Py_INCREF(alpha); is generally only necessary if you intend to keep the reference in a global variable or a class member. But since you are only wrapping a function you don't have to do it. Sadly it could be that the function PyArray_SimpleNewFromData does not set the reference counter to 1, if that would be the case you would have to increase it to 1. I hope that was understandable ;).