I am writing a simple Python program.
My program seems to suffer from linear access to dictionaries,
its run-time grows exponentially even though the algorithm is quadratic.
I use a dictionary to memoize values. That seems to be a bottleneck.
The values I'm hashing are tuples of points.
Each point is: (x,y), 0 <= x,y <= 50
Each key in the dictionary is: A tuple of 2-5 points: ((x1,y1),(x2,y2),(x3,y3),(x4,y4))
The keys are read many times more often than they are written.
Am I correct that python dicts suffer from linear access times with such inputs?
As far as I know, sets have guaranteed logarithmic access times.
How can I simulate dicts using sets(or something similar) in Python?
edit As per request, here's a (simplified) version of the memoization function:
def memoize(fun):
memoized = {}
def memo(*args):
key = args
if not key in memoized:
memoized[key] = fun(*args)
return memoized[key]
return memo
See Time Complexity. The python dict is a hashmap, its worst case is therefore O(n) if the hash function is bad and results in a lot of collisions. However that is a very rare case where every item added has the same hash and so is added to the same chain which for a major Python implementation would be extremely unlikely. The average time complexity is of course O(1).
The best method would be to check and take a look at the hashs of the objects you are using. The CPython Dict uses int PyObject_Hash (PyObject *o) which is the equivalent of hash(o)
.
After a quick check, I have not yet managed to find two tuples that hash to the same value, which would indicate that the lookup is O(1)
l = []
for x in range(0, 50):
for y in range(0, 50):
if hash((x,y)) in l:
print "Fail: ", (x,y)
l.append(hash((x,y)))
print "Test Finished"
CodePad (Available for 24 hours)