My reading of this article suggests that a benefit of ReCAPTCHA is that it can have humans verify words not recognised in the OCR/digitization of books. It does this by using these words in "Are you human?" tests. So ReCAPTCHA kills two birds with one stone. Great!
But I dont get it. If the word can't be recognised by the digitization process then what is the input entered, by the supposed human being, verified against? How does this work?
It shows two words. One of them the computer already knows, the other, it doesn't. It assumes that if you get the known one right, that you must know the other.
You don't know which of the two is already known so you, theoretically can't trick it. Additionally, it will replay a word with multiple people to get independent confirmation before sending it back to the source (newspaper company, book scanning group) as a valid answer.
But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.