Thursday, December 23, 2010
The Death of Captcha
It's an interesting thing: websites use Captcha because it's seen as the most reliable computer-implemented Turing Test out there - the best way that a computer can ensure whether there's an actual person or a computer on the other end. The principle is great: a human will find it easy to understand a graphical illustration of text, a computer will trip up.
Well, that was then and this is now. If you think about it, character recognition doesn't require intelligence any more than any of the other things computers excel at do. It merely requires more and more complex algorithms for converting images to ASCII text. 'More and more complex' because Captcha and hackers play a cat-and-mouse game: Captcha finds a way to make their graphical representations of text more complex, hackers beat them. Captcha ought to be working within the field of 'more complex while still remaining easy for the discerning human eye', but they're clearly running out of ways to do that.
They will lose. Sooner or later, hackers will devise algorithms that are able to decode Captchas too complex for the human eye to decode. Sooner or later, the only practical value of a Captcha will be as a reverse Turing Test, one where failure is a more reliable indicator of a human presence than success. After all, hackers are providing a valuable public service: what seems like a destructive security breach is little more than very advanced OCR - optical character recognition, a tool that is highly useful elsewhere in society. OCR helps blind people read books, it allows data entry en masse from old printed documents, it allows all kinds of electronic 'scanning'. I saw an iPhone app the other day that did this cool thing where you pointed your iPhone camera at some text and it translated it for you. Again, OCR. I can remember even just five years ago playing around with OCR and finding it buggy as hell - about one word in ten would have a typo, and punctuation was all over the place. Anything more offbeat than the most basic sans-serif font would confuse it like mad. It was one of those technologies that seemed cool in theory but iffy in practice.
I have to assume that it works way better now, and if the manufacturers of OCR software aren't working in cahoots with Captcha hackers, they probably should be. What hinders one industry helps another.
But once Captcha throws in the towel and admits that their technology no longer works, what will be next? Voiceprint? Clever puns? I wonder - but I bet we'll see soon enough.