A checksum can be generated simply by adding bits. How is the extra step of taking the 1s complement useful?
I understand the theory. I know how to calculate 1s complement and I know about how adding the complements makes the result all 1s.
I would like to see a simple example of how an error is detected.
I believe the example you're looking for can be found here.
The reason we do 1's complement is that when the 1's complement is added to the sum of all the values, and the result is trimmed to the bit-length of the machine (16 bits in the example above), it is all 1's. CPUs have a feature to take 1's complement of numbers, and taking the 1's complement of all-1 is all-0.
The reason: CPUs hate to work with bits except in chunks of however many it normally use. So adding two 64-bit numbers may take 1 cycle, but checking all the bits of that number individually will take many more (in a naive loop, perhaps as high as 8x64 cycles). CPUs also have capability to trivially take 1's complements, and detect that the result of the last calculation was zero without inspecting individual bits and branch based on that. So basically, it's an optimization that lets us check checksums really fast. Since most packets are just fine, this lets us check the checksum on the fly and get the data to the destination much faster.