Memory Error at Python while converting to array

Asqan · May 27, 2014

My code is shown below:

from sklearn.datasets import load_svmlight_files
import numpy as np

perm1 =np.random.permutation(25000)
perm2 = np.random.permutation(25000)

X_tr, y_tr, X_te, y_te = load_svmlight_files(("dir/file.feat", "dir/file.feat"))

#randomly shuffle data
X_train = X_tr[perm1,:].toarray()[:,0:2000]
y_train = y_tr[perm1]>5 #turn into binary problem

The code works fine until here, but when I try to convert one more object to an array, my program returns a memory error.


X_test = X_te[perm2,:].toarray()[:,0:2000]


MemoryError                               Traceback (most recent call last)
<ipython-input-7-31f5e4f6b00c> in <module>()
----> 1 X_test = X_test.toarray()

C:\Users\Asq\AppData\Local\Enthought\Canopy\User\lib\site-packages\scipy\sparse\compressed.pyc in toarray(self, order, out)
    788     def toarray(self, order=None, out=None):
    789         """See the docstring for `spmatrix.toarray`."""
--> 790         return self.tocoo(copy=False).toarray(order=order, out=out)
    792     ##############################################################

C:\Users\Asq\AppData\Local\Enthought\Canopy\User\lib\site-packages\scipy\sparse\coo.pyc in toarray(self, order, out)
    237     def toarray(self, order=None, out=None):
    238         """See the docstring for `spmatrix.toarray`."""
--> 239         B = self._process_toarray_args(order, out)
    240         fortran = int(B.flags.f_contiguous)
    241         if not fortran and not B.flags.c_contiguous:

C:\Users\Asq\AppData\Local\Enthought\Canopy\User\lib\site-packages\scipy\sparse\base.pyc in _process_toarray_args(self, order, out)
    697             return out
    698         else:
--> 699             return np.zeros(self.shape, dtype=self.dtype, order=order)


I'm new in python, and I dont know whether one needs to manually fix the memory error.

Other parts of my code return the same errors (like training with knn or ann).

How can I fix this?


perimosocordiae · May 27, 2014

In cases like these, it's often possible to avoid converting your sparse matrices to dense format.

For example, you can do the permutation and slice easily with CSR or CSC sparse formats.

You haven't posted the code that follows, but I suspect that can be made to handle sparse inputs as well. If that's true, your memory issues will no longer be a problem.