I need to multiply two big matrices and sort their columns.
import numpy
a= numpy.random.rand(1000000, 100)
b= numpy.random.rand(300000,100)
c= numpy.dot(b,a.T)
sorted = [argsort(j)[:10] for j in c.T]
This process takes a lot of time and memory. Is there a way to fasten this process? If not how can I calculate RAM needed to do this operation? I currently have an EC2 box with 4GB RAM and no swap.
I was wondering if this operation can be serialized and I dont have to store everything in the memory.
One thing that you can do to speed things up is compile numpy with an optimized BLAS library like e.g. ATLAS, GOTO blas or Intel's proprietary MKL.
To calculate the memory needed, you need to monitor Python's Resident Set Size ("RSS"). The following commands were run on a UNIX system (FreeBSD to be precise, on a 64-bit machine).
> ipython
In [1]: import numpy as np
In [2]: a = np.random.rand(1000, 1000)
In [3]: a.dtype
Out[3]: dtype('float64')
In [4]: del(a)
To get the RSS I ran:
ps -xao comm,rss | grep python
[Edit: See the ps
manual page for a complete explanation of the options, but basically these ps
options make it show only the command and resident set size of all processes. The equivalent format for Linux's ps
would be ps -xao c,r
, I believe.]
The results are;
a
: 42200 kiBa
: 34368 kiBCalculating the size;
In [4]: (42200 - 34364) * 1024
Out[4]: 8024064
In [5]: 8024064/(1000*1000)
Out[5]: 8.024064
As you can see, the calculated size matches the 8 bytes for the default datatype float64
quite well. The difference is internal overhead.
The size of your original arrays in MiB will be approximately;
In [11]: 8*1000000*100/1024**2
Out[11]: 762.939453125
In [12]: 8*300000*100/1024**2
Out[12]: 228.8818359375
That's not too bad. However, the dot product will be way too large:
In [19]: 8*1000000*300000/1024**3
Out[19]: 2235.1741790771484
That's 2235 GiB!
What you can do is split up the problem and perfrom the dot
operation in pieces;
b
as an ndarraya
as an ndarray
in turn.b
and write the result to a file.del()
the row and load the next row.This wil not make it faster, but it would make it use less memory!
Edit: In this case I would suggest writing the output file in binary format (e.g. using struct
or ndarray.tofile
). That would make it much easier to read a column from the file with e.g. a numpy.memmap
.