Time complexity of accessing a Python dict

x10 picture x10 · Dec 26, 2009 · Viewed 108k times · Source

I am writing a simple Python program.

My program seems to suffer from linear access to dictionaries, its run-time grows exponentially even though the algorithm is quadratic.
I use a dictionary to memoize values. That seems to be a bottleneck.

The values I'm hashing are tuples of points. Each point is: (x,y), 0 <= x,y <= 50
Each key in the dictionary is: A tuple of 2-5 points: ((x1,y1),(x2,y2),(x3,y3),(x4,y4))

The keys are read many times more often than they are written.

Am I correct that python dicts suffer from linear access times with such inputs?

As far as I know, sets have guaranteed logarithmic access times.
How can I simulate dicts using sets(or something similar) in Python?

edit As per request, here's a (simplified) version of the memoization function:

def memoize(fun):
    memoized = {}
    def memo(*args):
        key = args
        if not key in memoized:
            memoized[key] = fun(*args)
        return memoized[key]
    return memo

Answer

Yacoby picture Yacoby · Dec 26, 2009

See Time Complexity. The python dict is a hashmap, its worst case is therefore O(n) if the hash function is bad and results in a lot of collisions. However that is a very rare case where every item added has the same hash and so is added to the same chain which for a major Python implementation would be extremely unlikely. The average time complexity is of course O(1).

The best method would be to check and take a look at the hashs of the objects you are using. The CPython Dict uses int PyObject_Hash (PyObject *o) which is the equivalent of hash(o).

After a quick check, I have not yet managed to find two tuples that hash to the same value, which would indicate that the lookup is O(1)

l = []
for x in range(0, 50):
    for y in range(0, 50):
        if hash((x,y)) in l:
            print "Fail: ", (x,y)
        l.append(hash((x,y)))
print "Test Finished"

CodePad (Available for 24 hours)