I need to perform a lot of work using 2D numpy arrays of various sizes and I would like to offload these calculations onto cython. The idea is that my 2D numpy arrays would be passed from python to cython where it would be converted into c-array or memory view and used in a cascade of other c-level functions to do the calculations.
After some profiling I ruled out using numpy arrays in cython due to some serious python overhead. Using memory views was MUCH faster and quite easy to use, but I suspect I can squeeze even more speedup from using c-arrays.
Here is my question though - how can I declare a 2D c-array in cython without predefining its dimensions with set values? For example, I can create a c-array from numpy this way:
narr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]], dtype=np.dtype("i"))
cdef int c_arr[3][4]:
for i in range(3):
for j in range(4):
c_arr[i][j] = narr[i][j]
and then pass it to a function:
cdef void somefunction(int c_Arr[3][4]):
...
But this implies I have a fixed sizde of array, which in my case will be useless. So I tried something like this:
narr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]], dtype=np.dtype("i"))
cdef int a = np.shape(narr)[0]
cdef int b = np.shape(narr)[1]
cdef int c_arr[a][b]: # INCORRECT - EXAMPLE ONLY
for i in range(a):
for j in range(b):
c_arr[i][j] = narr[i][j]
with the intention to pass it to a function like this:
cdef void somefunction(int a, int b, int c_Arr[a][b]):
...
But it doesn't work and the compilation fails with the error "Not allowed in a constant expression". I suspect I need t do it with malloc/free somehow? I had a look at this problem (How to declare 2D list in Cython), but it does not provide the answer to my problem.
It turns out that memory-views can be as fast as c-arrays if one makes sure that indexError checking in cython is switched-off for the memory views, which can be done by using cython compiler directive:
# cython: boundscheck=False
Thanks @Veedrac for the tip!
You just need to stop doing bounds checking:
with cython.boundscheck(False):
thesum += x_view[i,j]
that brings the speed basically up to par.
If you really want a C array from it, try:
import numpy as numpy
from numpy import int32
from numpy cimport int32_t
numpy_array = numpy.array([[]], dtype=int32)
cdef:
int32_t[:, :] cython_view = numpy_array
int32_t *c_integers_array = &cython_view[0, 0]
int32_t[4] *c_2d_array = <int32_t[4] *>c_integers_array
First you get a Numpy array. You use that to get a memory view. Then you get a pointer to its data, which you cast to pointers of the desired stride.