I was thinking that native DBM of Python should be quite faster than NOSQL databases such as Tokyo Cabinet, MongoDB, etc (as Python DBM has lesser features and options; i.e. a simpler system). I tested with a very simple write/read example as
#!/usr/bin/python
import time
t = time.time()
import anydbm
count = 0
while (count < 1000):
db = anydbm.open("dbm2", "c")
db["1"] = "something"
db.close()
db = anydbm.open("dbm", "r")
print "dict['Name']: ", db['1'];
print "%.3f" % (time.time()-t)
db.close()
count = count + 1
Read/Write: 1.3s Read: 0.3s Write: 1.0s
These values for MongoDb is at least 5 times faster. Is it really the Python DBM performance?
Python doesn't have a built-in DBM implementation. It bases its DBM functions on a wide range of DBM-style third party libraries, like AnyDBM, Berkeley DBM and GNU DBM.
Python's dictionary implementation is really fast for key-value storage, but not persistent. If you need high-performance runtime key-value lookups, you may find a dictionary better - you can manage persistence with something like cpickle or shelve. If startup times are important to you (and if you're modifying the data, termination) - more important than runtime access speed - then something like DBM would be better.
In your evaluation, as part of the main loop you have included both dbm open calls and also array lookup. It's a pretty unrealistic use case to open a DBM to store one value and the close and re-open before looking it up, and you're seeing the typical slow performance that one would when managing a persistent data store in such a manner (it's quite inefficient).
Depending on your requirements, if you need fast lookups and don't care too much about startup times, DBM might be a solution - but to benchmark it, only include writes and reads in the loop! Something like the below might be suitable:
import anydbm
from random import random
import time
# open DBM outside of the timed loops
db = anydbm.open("dbm2", "c")
max_records = 100000
# only time read and write operations
t = time.time()
# create some records
for i in range(max_records):
db[str(i)] = 'x'
# do a some random reads
for i in range(max_records):
x = db[str(int(random() * max_records))]
time_taken = time.time() - t
print "Took %0.3f seconds, %0.5f microseconds / record" % (time_taken, (time_taken * 1000000) / max_records)
db.close()