Is there a limit to entries in a Dictionary<>?

theoneawaited picture theoneawaited · Aug 11, 2010 · Viewed 26k times · Source

I have about 3000 different files I need to organize, and retrieve at different times during the game.

I created my own struct of variables. I was thinking about creating a "Dictionary " at the beginning of my application, and simply loading all my files before the game starts.

I'm wondering about performance: will a dictionary with this many entries cause my application to be slow? Would a large dictionary make "TryGetValue" and "ContainsKey" run slower?

thanks for the advice!

Answer

Jon Hanna picture Jon Hanna · Aug 11, 2010

TryGetValue and ContainsKey should be pretty fast at that size, as long as the key has well distributed hashes.

A Dictionary has an indexable number of "buckets". When it adds or looks for a value by a key it will take the value returned by GetHashCode(), hash it down again to be less than the number of buckets (generally something simple like modulo, but the implementation isn't defined), and look in the relevant bucket.

The bucket will currently have zero or more items. The dictionary will compare each item with the key using .Equals().

The first bit of finding the right bucket is going to be in constant time O(1). The second bit of comparing the key with the keys in the bucket is going to be in lineary time O(n) where n relates only to the number of items in that bucket, not in the whole collection.

Generally there should be very few items in each bucket (the number of buckets will grow to try to keep this the case) so the operation is essentially constant time.

If however your hash codes are poorly implemented, there will be lots of keys in the same bucket. The time complexity will get closer and closer to O(n), as can be seen by experimenting with an object with a deliberately bad GetHashCode that just returns 0 every time. In its worse case it is worse than a List, since a List is also O(n), but Dictionary has more overhead.

Does any of this mean you should worry? No, even relatively naïve hashing methods should give relatively good results. If you're using a string key, then it's probably already going to be more than good enough. If you're using a simple built-in type, then even more so.

If you do find that accessing the dictionary is slow though, then you want to pay attention to this and either fix the GetHashCode() method or create an IEqualityComparer (which lets you define outside rules for GetHashCode() and Equals() for use with dictionaries, hashsets, etc).

Most likely though, 3000 is nothing, it'll be fine.