Function application over numpy's matrix row/column

petr picture petr · Nov 10, 2011 · Viewed 78.6k times · Source

I am using Numpy to store data into matrices. Coming from R background, there has been an extremely simple way to apply a function over row/columns or both of a matrix.

Is there something similar for python/numpy combination? It's not a problem to write my own little implementation but it seems to me that most of the versions I come up with will be significantly less efficient/more memory intensive than any of the existing implementation.

I would like to avoid copying from the numpy matrix to a local variable etc., is that possible?

The functions I am trying to implement are mainly simple comparisons (e.g. how many elements of a certain column are smaller than number x or how many of them have absolute value larger than y).

Answer

unutbu picture unutbu · Nov 10, 2011

Almost all numpy functions operate on whole arrays, and/or can be told to operate on a particular axis (row or column).

As long as you can define your function in terms of numpy functions acting on numpy arrays or array slices, your function will automatically operate on whole arrays, rows or columns.

It may be more helpful to ask about how to implement a particular function to get more concrete advice.


Numpy provides np.vectorize and np.frompyfunc to turn Python functions which operate on numbers into functions that operate on numpy arrays.

For example,

def myfunc(a,b):
    if (a>b): return a
    else: return b
vecfunc = np.vectorize(myfunc)
result=vecfunc([[1,2,3],[5,6,9]],[7,4,5])
print(result)
# [[7 4 5]
#  [7 6 9]]

(The elements of the first array get replaced by the corresponding element of the second array when the second is bigger.)

But don't get too excited; np.vectorize and np.frompyfunc are just syntactic sugar. They don't actually make your code any faster. If your underlying Python function is operating on one value at a time, then np.vectorize will feed it one item at a time, and the whole operation is going to be pretty slow (compared to using a numpy function which calls some underlying C or Fortran implementation).


To count how many elements of column x are smaller than a number y, you could use an expression such as:

(array['x']<y).sum()

For example:

import numpy as np
array=np.arange(6).view([('x',np.int),('y',np.int)])
print(array)
# [(0, 1) (2, 3) (4, 5)]

print(array['x'])
# [0 2 4]

print(array['x']<3)
# [ True  True False]

print((array['x']<3).sum())
# 2