How can I allot more memory to Python program? Its not consuming more than 64MB on 4GB RAM

user1403483 picture user1403483 · Dec 24, 2012 · Viewed 12.8k times · Source

I have a Python program running on some input data on 4GB RAM 32-bit 12.04 Ubuntu. The time and space complexity of the program both are O(n). When input data is around 100 kb it completes the execution in about 4sec with peak RAM consumption being 0.5%(using 'top' command in LINUX). However, when I tried the input data of sizes 500kB, 2.5MB and 16 MB, the process did not finish within 1 hour(in each case, I had to cancel using Cntrl C) and the memory consumption was stuck at 1.6% (i.e. around 64MB in each case). Can I allocate this Python process with more RAM memory somehow?

Note: I am implementing the Map Reduce job in Python using 'mrjob' library made by Python.

Following is the log of successful execution when input csv file is 100 kB.

   ankit@ubuntu:~/mrj/mrjo/mrjob/examples$ python mt1.py as.txt > asop.txtusing configs in /home/ankit/.mrjob.conf
creating tmp directory /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00000
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00001
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00001
Counters from step 1:
  (no counters found)
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper-sorted
> sort /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00000 /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00001
> /usr/bin/python mt1.py --step-num=0 --reducer /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-reducer_part-00000
Counters from step 1:
  (no counters found)
> /usr/bin/python mt1.py --step-num=1 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-1-mapper_part-00000
Counters from step 2:
  (no counters found)
Moving /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-1-mapper_part-00000 -> /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/output/part-00000
Streaming final output from /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/output
removing tmp directory /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269

This is the execution log and traceback when input csv file is 2.5 MB.

ankit@ubuntu:~/mrj/mrjo/mrjob/examples$ python mt1.py matlabsample.csv > matsamop.txt
using configs in /home/ankit/.mrjob.conf
creating tmp directory /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00000
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00001
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00001
Counters from step 1:
  (no counters found)
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper-sorted
> sort /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00000 /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00001
> /usr/bin/python mt1.py --step-num=0 --reducer /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-reducer_part-00000
Counters from step 1:
  (no counters found)
> /usr/bin/python mt1.py --step-num=1 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-1-mapper_part-00000
^CTraceback (most recent call last):


  File "mt1.py", line 311, in <module>
    Motion_Tagging.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/job.py", line 545, in run
    mr_job.execute()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/job.py", line 561, in execute
    self.run_job()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/job.py", line 631, in run_job
    runner.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/runner.py", line 490, in run
    self._run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 193, in _run
    combiner_args=combiner_args)
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 488, in _invoke_step
    self._wait_for_process(proc_dict, step_num)
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 657, in _wait_for_process
    tb_lines = find_python_traceback(stderr_lines)
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/parse.py", line 171, in find_python_traceback
    for line in lines:
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 680, in _process_stderr_from_script
    for line in stderr:
KeyboardInterrupt

Answer

Ignacio Vazquez-Abrams picture Ignacio Vazquez-Abrams · Dec 24, 2012

You don't "allocate memory to a Python process", you use bigger structures in the Python program. At a fundamental level your algorithm is probably flawed in such a way that it doesn't take advantage of memory that is available.