t e c h n o l o g y
Danny Bradbury
War of the words Captchas are designed to thwart software bots, but how can you stop them thwarting humans too?
T
he seedy world of spam is all about one-upmanship. Spammers invent increasingly sophisticated ways of getting junk mail into our inboxes, while anti-spam software writers come up with algorithms to spot and delete it.They are locked in an endless coding Cold War.
puzzle to be solved before they grant access to an online resource. They rely on problems that are relatively easy for humans to solve, but very hard for computers.The aim is to prove that the entity behind the browser has a pulse, and is not an automated bot.
Thanks to the advent of webmail accounts and blogs, the same is now true of web developers. Spammers love to create webmail accounts from which to send junk mail automatically, and they like to leave spam comments on blogs.Web developers have been trying to find ways to stop spam bots from accessing these pages en masse.
A popular captcha takes a random sequence of text and distorts it.This creates an image that most humans can read, but which optical character recognition programs find difficult. Captchas often have colourful backgrounds to make OCR even more difficult. Companies such as Yahoo have had some success using captchas to stop automated bots from signing up for thousands of free email accounts per hour and then using them to post spam.
Captchas were developed to solve the problem.They are challenge and response mechanisms that present a
Captcha researcher Luis Von Ahn, an assistant professor at Carnegie Mellon University, says that captchas could also be useful to stop the automated gaming of online polls, and to stop bots from disobeying HTML instructions and penetrating restricted areas of a web site.
"Common sense has proved remarkably resistant to programming." On the face of it, captchas seem to tie-break the competition between security researchers and spammers. But spammers could try another line. To overcome this,Von Ahn suggests we may need to get human beings to solve captchas in exchange for a reward that is more than the information or process they seek.
Infosecurity Today September/October 2006
Common sense has proved remarkably resistant to programming, so creating problems that exploit human rather than machine intelligence should make it impossible to access protected resources using software algorithms.
The art of captcha
If a spammer wants to use a captchaprotected system, in theory he could open say, a captcha-protected porn site and copy the original captcha to
14
t e c h n
As researchers get closer to cracking the captcha OCR problem once and for all, captcha developers are turning to other techniques. One promising solution is KittenAuth, a system that Oli Warner is developing. KittenAuth presents nine pictures, and asks the user to select a group with something in common.The default is to select all the kittens in the group (other images would show different types of animal including, perhaps, animals which are visually kitten — like such as tigers and foxes). Website administrators can change the images and the questions, making it more difficult for spammers to game the system.
o l o g y
Luis Von Ahn, assistant professor, Carnegie Mellon: captchas can counter bots
his own site. Surfers who want to access the porn have to solve the captcha, and when they do, the spammer uses the answer to gain access to the protected page.The surfer then gets access to the porn, the spammer gets to send his spam, and everyone is happy (except the owner of the captcha-protected page).
"If they aren't doing it, it's probably because it's simply not worth it to them." However, there are no known examples of this happening. Why not? “If they aren't doing it, it's probably because it's simply not worth it to them,”Von Ahn says.“If some site, says Yahoo, uses a captcha that they cannot beat, they can just go to the next site that doesn't use a captcha.
Infosecurity Today September/October 2006
Another reason could be that spammers are simply getting better at solving image-based captchas. Several research groups have written OCR mechanisms that recognize captchas with over 80% success.And there's a limit: if captchas become too hard, then humans also have difficulties solving them, making them counter-productive. Daniel Lopresti, an associate professor at Lehigh University who specializes in
human intelligence verification, thinks that image-based captchas will be useless within a few years because of this. Not only legitimate researchers are working to solve captchas; spammers (who often have excellent technical resources) are too. “The good news is that when we can say they're no longer useful, it's because machines have become so good at the problems that there's a spin-off benefit for people as a whole. So we're forcing spammers to solve the pattern recognition problem," he says. But OCR on distorted images isn't the problem that Lopresti would choose for spammers to solve.“If we could generate captchas that would inspire people to help solve the handwriting recognition problem, that would have a strong benefit for society.” Spammers are already using for their own ends the same types of image obfuscation techniques employed by captchas. For example, they have used images of text instead of the actual ASCII text, to avoid their messages being read by machines. Even then, it was still possible for OCR engines in antispam tools to spot the offending text. Spammers have now started to distort the image of the text using electronic 'noise'.The image is still readable by humans, but current OCR engines are thwarted at least until anti-spam firms update them.
“The point of KittenAuth is to bring the balance between difficulty for people and computers back into perspective.The latest text captchas are nearing impossibility for humans, so they really are on their last legs as a viable option,” says Warner. However, text captchas and KittenAuth have one major drawback: they are not accessible by the blind.That led Peter Schmalfeldt to develop HumanAuth, a variation on KittenAuth that follows the same basic principle, but which loads instructions for screen readers to tell the user what image each picture represents. Von Ahn is sceptical.Telling potential spammers what the image is makes it easier to break the captcha,
Daniel Lopresti, associate professor, Lehigh University: captchas could escape human cognition
16
t e c h y
o g
Lopresti is doubtful.“You could plug them into a search engine, and you have a decent chance of getting the answer back,” he points out. Using more complex questions that are harder to parse would start another arms race
Finding a captcha system that outwits spammers will become increasingly important as more people begin to use them and as spammers find ways to defeat them.As the arms race between spammers and captcha writers develops, the winning solution will be one that maintains accessibility but also makes the authentication challenge as hard as possible for computers to solve.
l
KittenAuth's Warner doesn't buy it. “That's the problem when you try and chuck accessibility at a problem like this. Blind people are not supposed to perform well on a picturebased authentication/verification method.That's the point.”
The other idea is to do away with audio and visual captchas altogether and simply rely on text.Why not have a large database of simple questions and answers, such as 'What is the capital of France?', or 'Spell out the number 9'?
between security developers and spammers. But it could open some doors on to how to encode semantic meaning, which has been a big problem for the artificial intelligence community.
o
"The latest text captchas are nearing impossibility for humans, so they are on their last legs."
That won't please visually-impaired web surfers. So, what are the alternatives? Henry Baird, a professor who is studying human intelligence verification and is Lopresti's colleague at LeHigh, say a lot of research has gone into audio captchas.“You would choose a word at random, or a phrase, and then synthesize an auditory spoken version of that, and then you'd add [audio] noise,” he says. Offering a choice of visual or audio captchas would help to solve the accessibility problem.
n
he says. Schmalfeldt contends that asking indirect questions about the images (e.g. 'Choose the three images that represent nature', or 'Choose the three images that represent hard objects') helps to get around this problem.
Yet, this must be done in a way that doesn't become too difficult for real human users.To solve that problem, the anti-spam community will need as much human (and artificial) intelligence as it can muster.
•
Danny Bradbury is a technology journalist who writes for the Evening Standard, Computing, Computer Weekly and Microscope
Infosecurity Today September/October 2006 17