I'm coding a XML parser with libxml2. Actually, I finished it but there is a pretty annoying problem of memory. The program firstly get some links from my database and all those links point to a XML file. I use curl to download them. The process is simple : I download a file, then I parse it, and so on...
The problem seems to be when a parsing is finished. Curl downloads the next file but it seems that the previous XML is not freed, because I guess libxml2 loads it in RAM. When parsing the last XML, I find myself with a ~2.6GB of leak (yeah, some of these file are really big...) and my machine only has 4GB of RAM. It works for the moment, but in the future, more links will be added to the database, so I must fix it now.
My code is very basic:
xmlDocPtr doc;
doc = xmlParseFile("data.xml");
/* code to parse the file... */
xmlFreeDoc(doc);
I tried using:
xmlCleanupParser();
but the doc says : "It doesn't deallocate any document related memory." (http://xmlsoft.org/html/libxml-parser.html#xmlCleanupParser)
So, my question is : Does somebody know how to deallocate all this document related memory ?
The problem is that you are looking at the statistics in the wrong way...
When a program starts it allocates some memory from the OS for the heap. When it does malloc
(or similar function) the C runtime takes slices from that heap until it runs out. After that, it automatically asks the OS for more memory, maybe each time in greater blocks. When the program does free
it marks the freed memory as available for further malloc
s, but it will not return the memory to the OS.
You may think that this behavior is wrong, that the program is leaking, but it is not: the freed memory is accounted for, just not in the OS but in the C library layer of your application. Proof to that is that the memory for the second XML file does not add to the first one: it will only be noticeable if it is the greatest file yet.
You may also think that if this memory is not used any longer by this program, it is just wasted there and it cannot be used for other processes. But that's not true: if the memory is not touched in a while and it is needed elsewhere, the OS Virtual Memory Manager will swap it out and reuse it.
So, my guess is that actually you don't have a problem.
PS: What I've just described is not always true. Particularly many C libraries make a distinction between small and large memory chunks and allocate them differently.