What is the probability of collision with a 6 digit random alphanumeric code?

Question 1

What is the probability of collision with a 6 digit random alphanumeric code?

mysql perl hash probability alphanumeric

Nick · Sep 29, 2011 · Viewed 13.1k times · Source

Answer

Answer

Because of the Birthday Paradox it's more likely than you might think.

There are 2,176,782,336 possible codes, but even inserting just 50,000 rows there is already a quite high chance of a collision. For 1,000,000 rows it is almost inevitable that there will be many collisions (I think about 250 on average).

I ran a few tests and this is the number of codes I could generate before the first collision occurred:

73366
59307
79297
36909

Collisions will become more frequent as the number of codes increases.

Here was my test code (written in Python):

>>> import random
>>> codes = set()
>>> while 1:
        code=''.join(random.choice('1234567890qwertyuiopasdfghjklzxcvbnm')for x in range(6))
        if code in codes: break
        codes.add(code)

>>> len(codes)
36909

Question 2

I'm using the following perl code to generate random alphanumeric strings (uppercase letters and numbers, only) to use as unique identifiers for records in my MySQL database. The database is likely to stay under 1,000,000 rows, but the absolute realistic maximum would be around 3,000,000. Do I have a dangerous chance of 2 records having the same random code, or is it likely to happen an insignificantly small number of times? I know very little about probability (if that isn't already abundantly clear from the nature of this question) and would love someone's input.

perl -le 'print map { ("A".."Z", 0..9)[rand 36] } 1..6'

What is the probability of collision with a 6 digit random alphanumeric code?

Answer

Related questions