Tuesday, May 29, 2007

reCaptcha and human computation

reCaptcha is a cute idea, trying to turn all the "prove you are a human" tests on the Web into useful work.

From their "What is reCaptcha?" page:
About 60 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that's not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day.

What if we could make positive use of this human effort? reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into "reading" books.

reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher ... Each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA.
Very clever and very fun.

For more on Luis Von Ahn's work, see the discussion and links to talks and papers in my earlier post, "Human computation and playing games".

[reCaptcha found via O'Reilly Radar]

4 comments:

Anonymous said...

Luis Von Ahn's work is indeed very clever.

However, his human computation games actually lie to players though this doesn't seem to bother many people. The deception is clever after all.

In particular, when playing a two-player game, there is a clear expectation that you will be playing with another human live. But this is not always the case. See for example see the sections on pre-recorded gameplay and cheating here:

http://www.cs.cmu.edu/~biglou/ESP.pdf#page=3

As for reCaptchas you may be getting free labor out of users without their permission. And even if users were to be informed of what is going on, I would expect that many of them would not really understand it. But you will be getting free labor out of them anyway.

KwangErn Liew said...

There are two notable sites that works on cracking captchas, http://sam.zoy.org/pwntcha/ and http://www.ocr-research.org.ua/list.html

Both of their guidelines may deduce that reCaptcha's technology isn't anything new.

Funnily enough, am working on a captcha, soon to be released. Following strictly on the guidelines that those websites have given, I think I might have beat it, for now. ;)

Ballard said...

I don't know about whether it would be deceptive or not, but I wonder if something like this could be randomly substituted for the "word verification" field that safeguard blog comments from 'bots. That could generate a lot of OCR corrections, while disguised as a "normal" part of daily Internet life.

Michael Fokken said...

Wow, I didn't know that, but that's pretty clever.