I am using pandas.DataFrame
in a multi-threaded code (actually a custom subclass of DataFrame
called Sound
). I have noticed that I have a memory leak, since the memory usage of my program augments gradually over 10mn, to finally reach ~100% of my computer memory and crash.
I used objgraph to try tracking this leak, and found out that the count of instances of MyDataFrame
is going up all the time while it shouldn't : every thread in its run
method creates an instance, makes some calculations, saves the result in a file and exits ... so no references should be kept.
Using objgraph
I found that all the data frames in memory have a similar reference graph :
I have no idea if that's normal or not ... it looks like this is what is keeping my objects in memory. Any idea, advice, insight ?
Confirmed that there's some kind of memory leak going on in the indexing infrastructure. It's not caused by the above reference graph. Let's move the discussion to GitHub (SO is for Q&A):
https://github.com/pydata/pandas/issues/2659
EDIT: this actually appears to not be a memory leak at all, but has to do with the OS memory allocation issues perhaps. Please have a look at the github issue for more information