I've seen 8-bit, 16-bit, and 32-bit CRCs.
At what point do I need to jump to a wider CRC?
My gut reaction is that it is based on the data length:
EDIT: Looking at the Wikipedia page about CRC and Lott's answer, here' what we have:
<64 bytes: 8-bit CRC
<16K bytes: 16-bit CRC
<512M bytes: 32-bit CRC
It's not a research topic. It's really well understood: http://en.wikipedia.org/wiki/Cyclic_redundancy_check
The math is pretty simple. An 8-bit CRC boils all messages down to one of 256 values. If your message is more than a few bytes long, the possibility of multiple messages having the same hash value goes up higher and higher.
A 16-bit CRC, similarly, gives you one of the 65,536 available hash values. What are the odds of any two messages having one of these values?
A 32-bit CRC gives you about 4 billion available hash values.
From the wikipedia article: "maximal total blocklength is equal to 2**r − 1
". That's in bits. You don't need to do much research to see that 2**9 - 1
is 511 bits. Using CRC-8, multiple messages longer than 64 bytes will have the same CRC checksum value.