Python memory consumption on Linux: physical and virtual memory are growing while the heap size remains the same

Vitaly Isaev picture Vitaly Isaev · Apr 29, 2014 · Viewed 10.1k times · Source

I'm working on the some kind of a system service (actually it's just a log parser) written in Python. This program should work continuously for a long time (hopefully I mean days and weeks without failures and needs of restart). That's why I am concerned about memory consumption.

I put together different information about process memory usage from different sites into one simple function:

#!/usr/bin/env python
from pprint import pprint
from guppy import hpy
from datetime import datetime
import sys
import os
import resource
import re

def debug_memory_leak():
    #Getting virtual memory size 
    pid = os.getpid()
    with open(os.path.join("/proc", str(pid), "status")) as f:
        lines = f.readlines()
    _vmsize = [l for l in lines if l.startswith("VmSize")][0]
    vmsize = int(_vmsize.split()[1])

    #Getting physical memory size  
    pmsize = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

    #Analyzing the dynamical memory segment - total number of objects in memory and heap size
    h = hpy().heap()
    if __debug__:
        print str(h)
    m = re.match(
        "Partition of a set of ([0-9]+) objects. Total size = ([0-9]+) bytes(.*)", str(h))
    objects = m.group(1)
    heap = int(m.group(2))/1024 #to Kb

    current_time = datetime.now().strftime("%H:%M:%S")
    data = (current_time, objects, heap, pmsize, vmsize)
    print("\t".join([str(d) for d in data]))

This function has been used to study the dynamics of the memory consumption of my long-playing process, and I still cannot explain its behavior. You can see that the heap size and total amount of the objects did not changed while the physical and virtual memory increased by 11% and 1% during these twenty minutes.

UPD: The process has been working for almost 15 hours by this moment. The heap is still the same, but the physical memory increased sixfold and the virtual memory increased by 50%. The curve is seem to be linear excepting the strange outliers at 3:00 AM:

Time Obj Heap PhM VM

19:04:19 31424 3928 5460 143732

19:04:29 30582 3704 10276 158240

19:04:39 30582 3704 10372 157772

19:04:50 30582 3709 10372 157772

19:05:00 30582 3704 10372 157772

(...)

19:25:00 30583 3704 11524 159900

09:53:23 30581 3704 62380 210756

I wonder what is going on with the address space of my process. The constant size of heap suggests that all of the dynamical objects are deallocated correctly. But I have no doubt that growing memory consumption will affect the sustainability of this life-critical process in the long run.

enter image description here

Could anyone clarify this issue please? Thank you.

(I use RHEL 6.4, kernel 2.6.32-358 with Python 2.6.6)

Answer

user3588162 picture user3588162 · May 1, 2014

Without knowing what your program is doing, this might help.

I came across this article when working on a project a while back: http://chase-seibert.github.io/blog/2013/08/03/diagnosing-memory-leaks-python.html Which says, "Long running Python jobs that consume a lot of memory while running may not return that memory to the operating system until the process actually terminates, even if everything is garbage collected properly."

I ended up using the multiprocessing module to have my project fork a separate process and return when it needed to do work, and I haven't noticed any memory issues since.

That or try it in Python 3.3 http://bugs.python.org/issue11849