More specific dupe of 875228—Simple data storing in Python.
I have a rather large dict (6 GB) and I need to do some processing on it. I'm trying out several document clustering methods, so I need to have the whole thing in memory at once. I have other functions to run on this data, but the contents will not change.
Currently, every time I think of new functions I have to write them, and then re-generate the dict. I'm looking for a way to write this dict to a file, so that I can load it into memory instead of recalculating all it's values.
to oversimplify things it looks something like: {((('word','list'),(1,2),(1,3)),(...)):0.0, ....}
I feel that python must have a better way than me looping around through some string looking for : and ( trying to parse it into a dictionary.
Why not use python pickle? Python has a great serializing module called pickle it is very easy to use.
import cPickle
cPickle.dump(obj, open('save.p', 'wb'))
obj = cPickle.load(open('save.p', 'rb'))
There are two disadvantages with pickle:
If you are using python 2.6 there is a builtin module called json. It is as easy as pickle to use:
import json
encoded = json.dumps(obj)
obj = json.loads(encoded)
Json format is human readable and is very similar to the dictionary string representation in python. And doesn't have any security issues like pickle. But might be slower than cPickle.