So I want to work on this summer project to correct errors in a message transmission using Hamming Code, but I cannot figure out how it really works. I've read many articles online, but I don't really understand the algorithm. Can anybody explain it in simple terms?
Thanks.
It's all about Hamming distance.
The Hamming distance between two base-2 values is the number of bits at which they differ. So if you transmit A, but I receive B, then the number of bits which must have been switched in transmission is the Hamming distance between A and B.
Hamming codes are useful when the bits in each code word are transmitted somehow separately. We don't care whether they're serial or parallel, but they aren't for instance combined into an analogue value representing several bits, or compressed/encrypted after encoding.
Thus, each bit is independently (at random with some fixed probability), either received correctly, or flipped. Assuming the transmission is fairly reliable, most bits are received correctly. So errors in a small number of bits are more likely, and simultaneous errors in large numbers of bits are unlikely.
So, a Hamming code usually aims to correct 1-bit errors, and/or to detect 2-bit errors (see the Wikipedia article for details of the two main types). Codes which correct/detect bigger errors can be constructed, but AFAIK aren't used as much.
The code works by evenly spacing out the code points in "Hamming space", which in mathematical terms is the metric space consisting of all values of the relevant word size, with Hamming distance as the metric. Imagine that each code point is surrounded by a little "buffer zone" of invalid values. If a value is received that isn't a code point, then an error must have occurred, because only valid code points are ever transmitted.
If a value in the buffer zone is received, then on the assumption that a 1-bit error occurred, the value which was transmitted must be distance 1 from the value received. But because the code points are spread out, there is only one code point that close. So it's "corrected" to that code point, on grounds that a 1-bit error is more likely than the greater error that would be needed for any other code point to produce the value received. In probability terms, the conditional probability that you sent the nearby code point is greater than the conditional probability that you send any other code point, given that I received the value I did. So I guess that you sent the nearby one, with a certain confidence based on the reliability of the transmission and the number of bits in each word.
If an invalid value is received which is equidistant from two code points, then I can't say that one is more likely to be the true value than the other. So I detect the error, but I can't correct it.
Obviously 3-bit errors are not corrected by a SECDED Hamming code. The received value is further from the value you actually sent, than it is to some other code point, and I erroneously "correct" it to the wrong value. So you either need transmission reliable enough that you don't care about them, or else you need higher-level error detection as well (for example, a CRC over an entire message).