jump to navigation

CAPTCHAs: Accessibility vs Security January 8, 2009

Posted by eingang in Interesting.
Tags: , , , , , ,
trackback
Sample CAPTCHA image
Image by BMauer
Public Domain


You probably have signed up at a web site where you were presented with a graphic showing you some combination of letters, numbers, or words in a graphic to prove that you are a real human being and not some kind of spam bot. The “following/finding” image above is an example of a word-based version of that task. These images are known as CAPTCHAs (Completely Automated Public Turing Test to tell Computers and Humans Apart). My earliest recollection of seeing them in wide use was on blog sites with open commenting. Automated programs would submit “comments” consisting of links to pornographic web sites or pharmacy sites. For popular bloggers, even if they had a system to moderate comments before making the comments publicly visible, the overhead in managing their blog could quickly become unreasonable. For similar reasons, sites like Yahoo Directory, Google Mail, and HotMail were also fairly quick to adopt CAPTCHAs.

For most people, the main issue about CAPTCHAs was whether they were effective or not. As with anti-virus efforts, it is an ongoing fight between the guys in the white hats to protect their systems against the guys in the black hats who want to pervert the protected systems to their own ends. From an accessibility point of view, though, that issue was minor potatoes. Even users with perfect vision often have trouble with CAPTCHAs because of the level of distortion involved in obscuring the letters or words. The solution to that was to add a “refresh” or “recycle” button to the CAPTCHA so it would give you a new CAPTCHA.

However, if you were blind or had poor vision, it was pretty much well impossible to work past the graphic. What the initial CAPTCHA developers had failed to consider was how users relying on assistive technology to surf the web were going to be able to use a CAPTCHA graphic.   Why was that? Consider the usual way of making graphical content accessible: add a description to the image. If the task for our sample CAPTCHA above is to type out the words in the picture, putting “CAPTCHA image with the words ‘following’ and ‘finding’” as the description is going to help those not using images, yes, but it is also one hundred percent accessible to automated programs. While we obviously like to endorse accessibility for all, there is a tension between accessibility and security;it is completely undesirable for automated programs to be able to circumnavigate a security system so easily.

reCAPTCHA sample with refresh and audio components
Figure 1: “overlooks/inquiry” reCAPTCHA Example

One solution to the accessibility issue was to add an audio component to the CAPTCHA. The “overlooks/inquiry” image shows a reCAPTCHA example that incorporates both the refresh button (the recycle-like symbol at the top of the column of icons), a help icons (at the bottom of the icon column), and the audio CAPTCHA icon (middle of the icon column). When you click the audio icon, the large word area of the CAPTCHA is replaced with a mini audio player and you are instructed to type what you hear. The audio in most examples I have tried is not the words in the graphical version. The audio quality is usually poor and may, on purpose, be distorted with additional people speaking or background noise in order to make it difficult for automated speech recognition programs to function. I often have trouble with the audio because of my own neurological hearing problems and the interference caused by background noise and lack of context. Try it yourself on a few examples at the reCAPTCHA site.

You might be thinking that the audio reCAPTCHA is a good compromise at trying to ensure accessibility for human beings while denying it to automated programs. Unfortunately, recent research studies have revealed that all of the common audio CAPTCHAs in use were vulnerable to automated speech processing techniques, with anywhere from roughly 50 percent to 70 percent accuracy. This excerpt from the December 8, 2008 Ars Technica article Computer scientists find audio CAPTCHAs easy to crack summarizes the important results:

The work involved gathering 1,000 audio CAPTCHAs from Google, Digg, and the reCAPTCHA service. 900 of these were used as a training set and the remaining 100 were set aside to test the system when done. The software first did a rough audio analysis, dividing each item into equal-sized chunks, each sufficiently long to fit any spoken character. Those segments with the highest energy peaks, which are considered most likely to contain actual letters, were set aside for analysis.

The authors tested a number of methods used to extract features from recordings of speech (for the curious, these are mel-frequency cepstral coefficients and two forms each of perceptual linear prediction and relative spectral transform-PLP). These features were then subjected to analysis using machine learning programs, which were trained on the identification of individual characters. Three methods—AdaBoost, support vector machines (SVM), and k-nearest neighbor (k-NN)—were trained using the 900 audio CAPTCHAs that had been processed manually. The result of this pairing of processing and analysis methods was a total of 15 different attempts at cracking each of the 100 test audio CAPTCHAs.

Google’s audio CAPTCHAs consist of a series of the digits 0 through 9 recited over background noise of speech played backwards. That was nowhere close to enough to consistently fool the researchers’ software; the SVM technique got the CAPTCHA right about two-thirds of the time, and AdaBoost wasn’t far behind (k-NN performed badly in this test). Digg uses both digits and letters, but plays them over a less complex background that sounds like flowing water. AdaBoost failed this test entirely, but SVM was able to clear 70 percent accuracy with several of the processing techniques; k-NN trailed it by a significant margin.

reCAPTCHA’s own audio version was similar to Google’s but used different speakers for different digits. This proved to be a significant barrier to the learning algorithms, which, at best, got it right a bit less than half the time (again, SVM was the star). As the authors point out, however, getting it right half the time would be more than worth the effort for spammers that may have hundreds or thousands of computers at their disposal. Some sites also allow the answer to be off by one digit, which would significantly increase the success rate.

[From Computer scientists find audio CAPTCHAs easy to crack]

We again have that tension between accessibility for people but inaccessibility for automated programs. A 50 percent success rate is not low enough to deter the bad guys. What can be done? The researchers, however, did conclude that “more of just about everything is better: more speakers, more characters, more distortion, and longer strings of tokens all seem to make a difference. As a result, they have expanded their own service to include all numbers from 0 to 99.” Time will tell how that pans out. I still wish we did not have to rely on different speakers, distortion, and entire sentences for audio CAPTCHAs as that too poses its own accessibility issues for those with physical or neurological hearing problems.

Perhaps there is mileage in some of the lesser-used systems that ask people to do simple mathematics or ask common-sense questions like “What colour is grass?” I suspect those too will be quite vulnerable to automated systems as the number of questions will be limited. Unsatisfactorily, we may have to settle with the situation as it currently stands until someone cleverer than me has a bright idea. If you had to solve the problem of making CAPTCHA technology accessible but secure, how would you do it? Or is there a better way to separate the people from the programs?

Further Reading:

Comments»

No comments yet — be the first.