How to delete multiple pandas (python) dataframes from memory to save RAM?

GeorgeOfTheRF picture GeorgeOfTheRF · Aug 27, 2015 · Viewed 153.7k times · Source

I have lot of dataframes created as part of preprocessing. Since I have limited 6GB ram, I want to delete all the unnecessary dataframes from RAM to avoid running out of memory when running GRIDSEARCHCV in scikit-learn.

1) Is there a function to list only, all the dataframes currently loaded in memory?

I tried dir() but it gives lot of other object other than dataframes.

2) I created a list of dataframes to delete

del_df=[Gender_dummies,
 capsule_trans,
 col,
 concat_df_list,
 coup_CAPSULE_dummies]

& ran

for i in del_df:
    del (i)

But its not deleting the dataframes. But deleting dataframes individially like below is deleting dataframe from memory.

del Gender_dummies
del col

Answer

pacholik picture pacholik · Aug 27, 2015

del statement does not delete an instance, it merely deletes a name.

When you do del i, you are deleting just the name i - but the instance is still bound to some other name, so it won't be Garbage-Collected.

If you want to release memory, your dataframes has to be Garbage-Collected, i.e. delete all references to them.

If you created your dateframes dynamically to list, then removing that list will trigger Garbage Collection.

>>> lst = [pd.DataFrame(), pd.DataFrame(), pd.DataFrame()]
>>> del lst     # memory is released

If you created some variables, you have to delete them all.

>>> a, b, c = pd.DataFrame(), pd.DataFrame(), pd.DataFrame()
>>> lst = [a, b, c]
>>> del a, b, c # dfs still in list
>>> del lst     # memory release now