关于 Google reCAPTCHA 的验证码

首先需要强调的是,验证码绝对是一项反人类的发明!

但是辩证唯物主义告诉我们,凡事都要一分为二的来看,有的验证码在骚扰正常上网行为的同时在某些方面也出了不为人知的贡献。以下就以 Google reCAPTCHA 的验证码服务举例说明。

国外很多下载站用到下面这种验证码,它其实是 Google 提供的 reCAPTCHA 服务(link),可以免费申请和使用。
关于 reCAPTCHA 验证码

感谢 Jimmy Liye 同学翻译了 Google reCAPTCHA 的部分说明(原文:放了这只验证码吧):

验证码大家每天都会见到,可是你会想到当你每次不耐烦的输入一个单词的时候都为人类做出了一点贡献吗?

验证码(CAPTCHA)或者叫做全自动区分计算机和人类的图灵测试(Completely Automated Public Turing test to tell Computers and Humans Apart),使我们上网的人每天都可以见到的,而它的作用除了防止垃圾注册或者评论以外还有别的吗?来自Google的reCAPTCHA(上图)告诉我们,你其实还可以为人类做贡献。

题外话:现在OCR的阅读效率实在是不高,下面这张图就可以说明问题。
关于 reCAPTCHA 验证码

它的情景是这样的,有一天,某台机器扫描了一本书,想把它转成电子版:
关于 reCAPTCHA 验证码

处理出来是这样子的:

The Hreckinridge’ and Lane Democrats, having taken courage at the recent eastern advises, are [xxxxxxxxxx] energetically for the campaign: Several prominent Democrats who at first favoredDonoLea, are coming out. for the other aide, apparently under the [xxxxxxxx] of Federal [xxxxxxxxx]. An address to the National Democracy of ,1ifornia, urging the party to supportHaeeslipslDas, has recently been published, which manifestlybss strengthened that aide of the [xxxxxxxxx]: It is signed by 65 Democrats, many of whom occupy respectab e and prominent positions in the party, 22 of them are Federal office-holders,[xxxxx] more are recipients of Federal patronage, and the others represent a mass of politicians giving the document [xxxx][xxxxxx] mTheDcu8las Democrats are also active The Irish and German vote will mostly go with ths# branch of the party, but it is[xxxxxxxxx] to [xxxxxxxx] [xxxxx] [xxxx] [xx] the stronger. Thus far 17 IT newspapers have declared for DonGres, 13 for Base$- IaaIDGS and 9 remain non-committal, with even chances of going either way. Under these circumstances the Republicans entertain not unjustifiable hopes that the Democratic divisions may be so equal,- ly balanced as to give the State [xx] LIaCOLV.Same very [xxxxxxx] Bell and Everett meetings have been held in different parts of the State, bat thus far that party does not exhibit much rank sad ale air en.

这个是原书质量比较差的:
关于 reCAPTCHA 验证码

看到这个,电脑就傻眼了,吐出来一堆这种东西:

‘ letz-1- rrk fit: 1′ . on its to Vc ,rt, cann into tlm yc H_ tcr,la, .n. ‘l l; , arc ti:( h of thc 1″,ats that to ltc rc: ,;. , I; ., l: rel!;n. tani., , ./olio, IJuteilu, . 1!’i./_ ;lr”n. Iiam! Jr.r. F’l,nr_.Z.._%i;;, ,, : rt-Irn: am/ tf.rri.:, t?m steamer as a tr nW r. Uu ,tin;t, c ac?1 1″,at firm/ a t;nn, accor.liu; to .t rn. ‘Cl.w r. wu ru lm:nui MistinW /y in u;th, -. ink ;:,k as to “what w ax 1111, :111(I vle:iR a of ;: (,am( into, mnr r-, tm if tlm wo r( uu.i n:’ of t?u : la?:Iv. \ ‘c : ol in thc , ucr:atic , , Tlau :; will h:aw tu-li.r \. ’1′Im yap?tts Il ,,n an,/ I, ,rr:l. r, (,t tf,is r:ity, start witli it, with lu:rtic: ol \ 1- e:l.k.

看得懂吗?反正我是看不懂。验证码的出现,就是为了改变这种情况的。这张图可以很好的解释它的工作原理:
关于 reCAPTCHA 验证码

1、我们首先扫描了一本有一大堆单词的书
2、我们找出不能被电脑识别的单词,这些单词有可能是不同的字体。当然了,损坏程度也与书籍年代以及扫描质量有关。
3、为了让这些验证码更安全,我们会让它更扭曲并且在上面加入横线。
4、一个验证码是有两个这样的单词组成的。

有了它的帮助,第二张图片上面的文字就会变得清楚多了(不过还是有一点小错误):

The New-York State yacht Squadron, on its annual cruise to Newport came into the harbor yesterday afternoon. The following are the names of the boats that came to anchor here: Jessie, gera loliv erelun Annie, Mannering, Julia, Bonita, Magic wut, Rambler, floumblie, Henrietta, Sea-Drift and Maria, with the steamer America as a tender. On anchoring each boat fired a gun, according to custom. The reports were heard distinctly in the city, causing considerable inquiry as to “what was up,” and quite a number of sanguine individuals came into our office to inquire if the guns were not annunciatory signals of the successful laying of the Atlantic Cable. We invariably replied in the negative. The squadron will leave to-day for Newport. The yachts Washington and buub r of this city, start with it, with parties of New Haven people.

有的人可能要问了,既然机器都看不明白那他怎么判断你输对了还是错了呢?我一开始也有这样的问题,Google是这样解释的:
两个验证码里面有一个是正确的,被人审核过的,而另一个是不正确的,机器读不出来的。当你把那个正确的输对以后我们就会默认另外一个也是对的,这样,你每输入一次验证码,就为人类的知识宝库里增加了一个单词。

了解和申请 reCAPTCHA :www.google.com/recaptcha

 

关于 Google reCAPTCHA 的验证码》有一个想法

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注