I am storing a table using python and I need persistence.
Essentially I am storing the table as a dictionary string to numbers. And the whole is stored with shelve
self.DB=shelve.open("%s%sMoleculeLibrary.shelve"%(directory,os.sep),writeback=True)
I use writeback
to True
as I found the system tends to be unstable if I don't.
After the computations the system needs to close the database, and store it back. Now the database (the table) is about 540MB, and it is taking ages. The time exploded after the table grew to about 500MB. But I need a much bigger table. In fact I need two of them.
I am probably using the wrong form of persistence. What can I do to improve performance?
For storing a large dictionary of string : number
key-value pairs, I'd suggest a JSON-native storage solution such as MongoDB. It has a wonderful API for Python, Pymongo. MongoDB itself is lightweight and incredibly fast, and json objects will natively be dictionaries in Python. This means that you can use your string
key as the object ID, allowing for compressed storage and quick lookup.
As an example of how easy the code would be, see the following:
d = {'string1' : 1, 'string2' : 2, 'string3' : 3}
from pymongo import Connection
conn = Connection()
db = conn['example-database']
collection = db['example-collection']
for string, num in d.items():
collection.save({'_id' : string, 'value' : num})
# testing
newD = {}
for obj in collection.find():
newD[obj['_id']] = obj['value']
print newD
# output is: {u'string2': 2, u'string3': 3, u'string1': 1}
You'd just have to convert back from unicode, which is trivial.