Pickle dump huge file without memory error

Question 1

Pickle dump huge file without memory error

python memory file-io pickle

user2543682 · Jul 7, 2013 · Viewed 46.6k times · Source

Answer

Answer

I was having the same issue. I use joblib and work was done. In case if someone wants to know other possibilities.

save the model to disk

from sklearn.externals import joblib
filename = 'finalized_model.sav'
joblib.dump(model, filename)

some time later... load the model from disk

loaded_model = joblib.load(filename)
result = loaded_model.score(X_test, Y_test) 

print(result)

Question 2

I have a program where I basically adjust the probability of certain things happening based on what is already known. My file of data is already saved as a pickle Dictionary object at Dictionary.txt.

The problem is that everytime that I run the program it pulls in the Dictionary.txt, turns it into a dictionary object, makes it's edits and overwrites Dictionary.txt. This is pretty memory intensive as the Dictionary.txt is 123 MB. When I dump I am getting the MemoryError, everything seems fine when I pull it in..

Is there a better (more efficient) way of doing the edits? (Perhaps w/o having to overwrite the entire file everytime)
Is there a way that I can invoke garbage collection (through gc module)? (I already have it auto-enabled via gc.enable())
I know that besides readlines() you can read line-by-line. Is there a way to edit the dictionary incrementally line-by-line when I already have a fully completed Dictionary object File in the program.
Any other solutions?

Thank you for your time.

Pickle dump huge file without memory error

Answer

Related questions