Here is my code:
from memory_profiler import profile
@profile
def mess_with_memory():
huge_list = range(20000000)
del huge_list
print "why this kolaveri di?"
This is what the output is, when I ran it from interpreter:
3 7.0 MiB 0.0 MiB @profile
4 def mess_with_memory():
5
6 628.5 MiB 621.5 MiB huge_list = range(20000000)
7 476.0 MiB -152.6 MiB del huge_list
8 476.0 MiB 0.0 MiB print "why this kolaveri di"
If you notice the output, creating the huge list consumed 621.5 MB while deleting it just freed up 152.6 MB. When i checked the docs, I found the below statement:
the statement del x removes the binding of x from the namespace referenced by the local scope
So I guess, it didn't delete the object itself, but just unbind it. But, what did it do in unbinding that it freed up so much of space(152.6 MB). Can somebody please take the pain to explain me what is going on here?
Python is a garbage-collected language. If a value isn't "reachable" from your code anymore, it will eventually get deleted.
The del
statement, as you saw, removes the binding of your variable. Variables aren't values, they're just names for values.
If that variable was the only reference to the value anywhere, the value will eventually get deleted. In CPython in particular, the garbage collector is built on top of reference counting. So, that "eventually" means "immediately".* In other implementations, it's usually "pretty soon".
If there were other references to the same value, however, just removing one of those references (whether by del x
, x = None
, exiting the scope where x
existed, etc.) doesn't clean anything up.**
There's another issue here. I don't know what the memory_profiler
module (presumably this one) actually measures, but the description (talking about use of psutil
) sounds like it's measuring your memory usage from "outside".
When Python frees up storage, it doesn't always—or even usually—return it to the operating system. It keeps "free lists" around at multiple levels so it can re-use the memory more quickly than if it had to go all the way back to the OS to ask for more. On modern systems, this is rarely a problem—if you need the storage again, it's good that you had it; if you don't, it'll get paged out as soon as someone else needs it and never get paged back in, so there's little harm.
(On top of that, which I referred to as "the OS" above is really an abstraction made up of multiple levels, from the malloc
library through the core C library to the kernel/pager, and at least one of those levels usually has its own free lists.)
If you want to trace memory use from the inside perspective… well, that's pretty hard. It gets a lot easier in Python 3.4 thanks to the new tracemalloc
module. There are various third-party modules (e.g., heapy
/guppy
, Pympler
, meliae
) that try to get the same kind of information with earlier versions, but it's difficult, because getting information from the various allocators, and tying that information to the garbage collector, was very hard before PEP 445.
* In some cases, there are references to the value… but only from other references that are themselves unreachable, possibly in a cycle. That still counts as "unreachable" as far as the garbage collector is concerned, but not as far as reference counts are concerned. So, CPython also has a "cycle detector" that runs every so often and finds cycles of mutually-reachable but not-reachable-from-anyone-else values and cleans them up.
** If you're testing in the interactive console, there may be hidden references to your values that are hard to track, so you might think you've gotten rid of the last reference when you haven't. In a script, it should always be possible, if not easy, to figure things out. The gc
module can help, as can the debugger. But of course both of them also give you new ways to add additional hidden references.