Memory errors and list limits?

Taupi picture Taupi · Apr 4, 2011 · Viewed 230.7k times · Source

I need to produce large and big (very) matrices (Markov chains) for scientific purposes. I perform calculus that I put in a list of 20301 elements (=one row of my matrix). I need all those data in memory to proceed next Markov step but i can store them elsewhere (eg file) if needed even if it will slow my Markov chain walk-through. My computer (scientific lab): Bi-xenon 6 cores/12threads each, 12GB memory, OS: win64

  Traceback (most recent call last):
  File "my_file.py", line 247, in <module>
    ListTemp.append(calculus)
MemoryError

Example of calculus results: 9.233747520008198e-102 (yes, it's over 1/9000)

The error is raised when storing the 19766th element:

ListTemp[19766]
1.4509421012263216e-103

If I go further

Traceback (most recent call last):
  File "<pyshell#21>", line 1, in <module>
    ListTemp[19767]
IndexError: list index out of range

So this list had a memory error at the 19767 loop.

Questions:

  1. Is there a memory limit to a list? Is it a "by-list limit" or a "global-per-script limit"?

  2. How to bypass those limits? Any possibilites in mind?

  3. Will it help to use numpy, python64? What are the memory limits with them? What about other languages?

Answer

G Gordon Worley III picture G Gordon Worley III · Apr 4, 2011

First off, see How Big can a Python Array Get? and Numpy, problem with long arrays

Second, the only real limit comes from the amount of memory you have and how your system stores memory references. There is no per-list limit, so Python will go until it runs out of memory. Two possibilities:

  1. If you are running on an older OS or one that forces processes to use a limited amount of memory, you may need to increase the amount of memory the Python process has access to.
  2. Break the list apart using chunking. For example, do the first 1000 elements of the list, pickle and save them to disk, and then do the next 1000. To work with them, unpickle one chunk at a time so that you don't run out of memory. This is essentially the same technique that databases use to work with more data than will fit in RAM.