I have a simple numpy array.
array([[10, 0, 10, 0],
[ 1, 1, 0, 0]
[ 9, 9, 9, 0]
[ 0, 10, 1, 0]])
I would like to take the median of each column, individually, of this array.
However, there are a few 0
values in various places which I would like to ignore in the calculation of the medians.
To further complicate, I would like to keep the columns with only 0
entries as having the median of 0
. In this manner, those columns would serve as a bit of a place holder, keeping the dimensions of the matrix the same.
The numpy documentation doesn't have any argument that would work for what I want (maybe I am spoiled by the many switches we get with R!)
numpy.median(a, axis=None, out=None, overwrite_input=False)[source]
Can someone please shed some light on an effective way to do this, which is in line with the spirit of numpy? I could hack it out but in that case I feel like I've defeated the purpose of using numpy in the first place.
Thanks in advance.
Masked array
is always handy, but slooooooow:
In [14]:
%timeit np.ma.median(y, axis=0).filled(0)
1000 loops, best of 3: 1.73 ms per loop
In [15]:
%%timeit
ans=np.apply_along_axis(lambda v: np.median(v[v!=0]), 0, x)
ans[np.isnan(ans)]=0.
1000 loops, best of 3: 402 µs per loop
In [16]:
ans=np.apply_along_axis(lambda v: np.median(v[v!=0]), 0, x)
ans[np.isnan(ans)]=0.; ans
Out[16]:
array([ 9., 9., 9., 0.])
np.nonzero
is even faster:
In [25]:
%%timeit
ans=np.apply_along_axis(lambda v: np.median(v[np.nonzero(v)]), 0, x)
ans[np.isnan(ans)]=0.
1000 loops, best of 3: 384 µs per loop