Skip to page content or skip to Accesskey List.
Search evolt.org
evolt.org login: or register

Work

Main Page Content

A Very Simple and Effective Captcha

Rated 0 (Add your rating)

Log in to add a comment
(11 comments so far)

Want more?

 
Picture of Frank Marion

Frank Marion

Member info | Full bio

User since: January 23, 2000

Last login: March 25, 2008

Articles written: 2

I have a very simple anti bot-spam technique that works extremely well. It requires no javascript, no cookies, no hidden fields, no complicated server weirdness, has negligible overhead, is fully accessible, has a high usability factor and is trivial to implement. Sound good? And it's dead simple too, so read on.

In order for spam to work, it has to be cost effective. In order for it to be cost effective, it has to be automated. In order for something to be automated, it requires a predictable pattern. The fundamental approach to this technique is to to deny the bot a usable pattern, yet make it easy for even the most inexperienced user to use. Since the spam bots use all of our form and predictable choices in form naming conventions (the pattern), we simply add a key that only a human can turn. We require that the user fill out one field with a number that is displayed in plain text. This makes it effective against automated solutions. That's fundamental captcha. Note: It will not, however, prevent humans from manually submitting spam to your form.

In the three years that I've used this, I've gotten a 100% anti spam-bot effectiveness rate over three years and some 20 odd sites. Sites that were being bombarded with hundreds of spams daily suddenly became quiet and good emails get though. I'm a Coldfusion coder, so my example is Coldfusion, but the technique is cross language. I hope it works as well for you.

Outline: The whole technique in a nutshell.

  1. Generate a random number
  2. Set it to a session variable
  3. Display the random number (session.variable) next to a field in the form
  4. Get the user to copy it over into the field.
  5. On submission, verify that the form.value equals the session.random number
  6. On pass you allow the submission
  7. On fail you exit, abort, show a message, do as you wish.

Sample code: The very bare bones version.

<!--- Before the form: generate a random number (I use 4 digits) --->
<cfscript>

   // Delete previous random number variables if they exist (no re-use of a cached number)
   if(isDefined("session.chk_rand")) {
      StructDelete(session, "chk_rand");
   }
  
   // Assign a new random number
   session.chk_rand = NumberFormat(RandRange(0, 9999),'0000');

</cfscript>


<!--- In the form: place this text input field, presumably near the submit button --->
   #session.chk_rand# enter this number here -> <input name="spmchck" type="text" size="4" maxlength="4"/>


<!--- After the form submission, before your form processing code (the validate/send/insert function) --->
<cfparam name="form.spmchck" default="">

<cfif form.spmchck NEQ session.chk_rand>

   ...Some human readable validation message (or other validation function) "Sorry, you need to fill in the following fields..."

   <!--- Delete the random number that was just used --->
   <cfscript>StructDelete(session, "chk_rand");</cfscript>
   <cfexit>

</cfif>

Explanation

If the form.value is equal to the session.random number, then it passes and you process normally, if not, the session.value with the random number is deleted (and thus the next attempt gives you a new random number), then the template is exited. That's it. All of it.

Most bots (any?) won't be able to recognize the system because the random number is plain old page text. The user has only a very simple task to perform, and the number changes every time the form is accessed, so even if the spammer submits manually, it's labour intensive to keep re-entering a random number. It would be even more so when combined with techniques such as permitting only one submission every 30 seconds. Those with javascript enabled will never see the server side response, and those without it (such as Braille readers) will still get a meaningful message.

Variations and additions

One could, if one were so inclined, add logging, honey pot links or emails, or use some sort of function to count how many submissions were made by the IP in a small time period to figure out if you want to ban them for 24 hours or not. Additions to this technique are limited only by your creativity.

When the spammers catch up with this notion, the simple response such as randomising the "put this number here" message, or location of the message (before, after, above or below the field, in a randomly selected P, SPAN or DIV, radio button or select menu) or adding alpha characters across many websites will create too much randomness to define a pattern. If you wanted to get fancy about it, you could even randomise the name of the field as a session.variable. The increase in complexity makes finding a pattern a lot of work, and therefore reduces cost effectiveness for a spammer to come up with a parser that can handle all the variations. The key is to get the human to do the thing that humans do easily and naturally, but that evades the predictability that a bot programmer needs, and patterns that s/he simply cannot foresee.

Bonus trick

I sometimes make a page with a submitable form redirect to the index page of the site if the form cannot REFind() my full domain name (or a specific list of pages) in the referrer. The user can still access the form, but they must follow a link from the site. This technique makes that particular document slightly less indexable, but improves the overall security. The tradeoff is up to you to determine.

Conclusion

You can get as sophisticated or remain as simple about these approaches as you wish. This is basically a low-tech, high effectiveness version of captcha that is as simple a concept as it gets, and that works well. I hope that the simplicity of this technique and it's variations will allow many of us to implement it across the net and if we are lucky, may have the same kind of impact that Bayesian filtering had on spam and in so doing bring one of Evolt's guidelines to the net: Keep the signal high.

Let me know how it works for you, or additional variations that you might come up with.

Thanks to Martin

Submitted by Frank Marion on March 7, 2008 - 11:29.

Thanks to Martin for helping me with some of the posting issues here.

login or register to post comments

Session without cookies?

Submitted by edwinm on March 8, 2008 - 21:26.

You write cookies are not required, but you use session variables.

Sessions are usually done with cookies. Not yours? Please explain.

login or register to post comments

I guess you'd be right about

Submitted by Frank Marion on March 9, 2008 - 16:25.

I guess you'd be right about that. What I had intended to mean was that there is no need to explicitly create cookie to make it work. I've been seeing a lot of extremely complicated solutions to take care of form spam, ranging from getting the user to do math, to complicated re-directs and .htaccess schemes to, well, all kind of complicated burdensome weirdness, then it hit me that this little toy of an idea has worked flawlessly for me for years, so I thought I'd share it. Remember the story about how in the early years of space travel, NASA spent an enormous amount of time, energy, resources and brainpower trying to create a pen that worked in a no-gravity environment? The Russians solved the problem by using a pencil. This is a pencil. As for why this work anyway, if a bot isn't using cookies and is just doing straight one off hits each time, I can only suppose that they are filling out all the form elements and "xzzgggqxxgq" doesn't match a non-existent session variable. It's a good questions, and it'll be interesting to lend more thought to it.

login or register to post comments

?

Submitted by Hoff on March 17, 2008 - 04:25.

Ummm, I think it'll take about 15 seconds for to write a script that will circumvent this. It works now because you're using on your personal site, but as soon as it's on any sort of target of value the flood gates will open.

login or register to post comments

Let's be serious here.

Submitted by Draicone on March 17, 2008 - 07:48.

<?php
preg_match
('/([0-9]+) enter this number here/', $html, $matches);
echo
$matches[1];
?>

Cracked in - oh, a grand total of 12 seconds. And not by me - by the nine year old girl next door.

You miss a fundamental concept of building a spam bot. In order for a spam bot to work, it has to be economical. That is, the cost of developing the spam bot has to come in at less than the profit gained from operating it.

Now, your personal sites are not worth cracking. They're small scale. At this point in time, nobody has yet bothered to build a spam bot to take them on.

However, on any site of reasonable scale - evolt, for instance - a spam bot would have been built within a day of this CAPTCHA approach being implemented. That said, now that you've published this on evolt, amateur developers are going to copy it straight into their CF apps - or port it to PHP/Ruby/Python/Perl/etc. - and it will suddenly become economical to build a spam bot to take on these sites. And people will. Quickly.

This will then create even further problems for such sites, as opposed to if they had developed their own implementation. In addition, in future people will build visual CAPTCHA crackers. The whole "enter a number" or "enter a letter" CAPTCHA trend is all very well when the approach is random, individual and implemented on a low scale. The moment it goes big time, however, it's infinitely less effective than a visual CAPTCHA.

Once RAD CAPTCHA cracking becomes a reality, the boost in this captcha trend created by your article will further assist spammers in developing malicious scripts to spam.

Now, don't get me wrong. The suggestions you mention in this article are fantastic. One submission per 30 seconds? Brilliant. Count submissions in a 24 hour period? If everyone did this, we could tip the scales and change the economics of form spamming entirely. Randomly adding characters or changing the location within the page? Would totally negate our 12 second crack. Yet your example (and associated code) reflects none of this, and the majority of web developers will not actually read your article in its entirety and implement most (if not all) of your suggestions.

Given this, it is really your responsibility to warn readers of the perils of using this code sample line-for-line, taking sections of code and reusing them in their entirety. Please do so, for the sake of the web.

login or register to post comments

"For the sake of the web"

Submitted by mcox on March 17, 2008 - 12:53.

Too funny, to think that the world is full of copy & pasters that believe anything on the web is true and 100% accurate.

login or register to post comments

Perspective is good.

Submitted by Frank Marion on March 17, 2008 - 21:54.

>You miss a fundamental concept of building a spam bot.
>In order for a spam bot to work, it has to be economical.
>That is, the cost of developing the spam bot has to come
>in at less than the profit gained from operating it.

See paragraph 2

The example is trivial by design. Hello World!

Took you trivial 12 seconds to "crack" a trivial example. Took me 5 minutes to write and implement it. Stopped spambots dead in their tracks for three years and counting.

Perspective is good. This article is about an idea, not an implementation. It's not intended to permanently thwart the focussed intent of all the world's spammers for the future of all mankind. When visual RAD spambots with evil artificial intelligence become popular we'll have a new idea (links to evil spambot porn sites?). But this one works well. Now.

With that, I'll point you to the concluding paragraphs and encourage you to offer up a non-trivial example.

Please do so, for the sake of the web.

Don't take yourself so seriously, have fun and have faith in the Evolt community. There's a smart cookie or two here.

login or register to post comments

Suggestion

Submitted by erpa1119 on March 23, 2008 - 22:52.

Why hasn't anyone implemented a simple question and answer set based captcha, things that computers/programs cannot complete just yet.

Things like, complete this sentence. With the answer being a list of say 10 words that most humans would choose correctly everytime, such as:

Q: Most people sleep on a?
A: Bed
A: Website
A: Picture
A: Guitar
A: Race Track
A: Star Trek
Etc.

OR

During the day, when you look up, you see the?

A: Sun
A: Grass
A: A Wrench
A: Door
A: Salvador Dali
Etc.

OR

Q: Please choose the joke from the below answers.

A: Why did the chicken cross the road
A: Science is fun
A: Click this button
Etc.

We could have a database of 100s of thousands of really easy questions and if we increase the amount of answers to say 20 per question that would make it even harder for a program to guess the right answer but most humans would choose correctly every time. I have not heard of any program or simulation that has been able to answer any of the questions above.


EC

login or register to post comments

It's the human factor

Submitted by Frank Marion on March 25, 2008 - 21:44.

The ideas that you propose have merits, and along with them downsides. Among the issues are language, culture, comprehension. One would need a database of questions and answers for each language the site is offered in. The creation of a unique database of Q and A's for each site would be costly in time, and a natural response would be to use pre-existing databases. This leads to patterns of use.

The second issue is that of interpretation. When I look up in the sky, I see clouds. It's not in the list. Worst, I see nothing, because I'm blind! The same with the joke. Too much room for interpretation. Especially is the user only speaks a rudimentary version of the language you are offering it in. One would also have to account for cultural variations. Imagine: "What do you do with a television?" a) watch it, b) listen to it, c) eat it. In the English speaking world people watch television. French speaking people "Écoutent la télévision" (listen to the television).

Copying a number or a set of characters is a simpler (read: harder to screw up) tasks than interpreting jokes or responses that might have cultural differences.

login or register to post comments

Similar technique

Submitted by dturover on April 1, 2008 - 00:49.

An even simpler technique, albeit less effective, is to create an empty textbox with the label "Leave this empty" and then style it to display: none so that most users wouldn't see it. On the backend, drop the submission if anything is in the box. Most of the spambots hitting us would fill in every textbox in the web page, so this cut spam down a great deal.

It also helped to rewrite the backend to concatenate all of the input fields and pipe the result through spamc, since SpamAssassin was running on the same machine. Unfortunately, we still got spammed at least once a week with both of these techniques implemented. I can't give a percentage of how well they worked because I was not receiving the messages.

Some commenters make the valid point that these sorts of techniques are not logically insurmountable, but they are not supposed to be. The subject here is quick hacks to change the protocol of the form transaction to make it incompatible with the majority of spambots. You will need stronger defenses if you have something behind them that the spammers will find valuable, like if you are a free web-based email provider or a widely used webforum package. For sites that don't need anything better and people whose bosses are not going to authorize the time to develop something better, this sort of thing is fine. Spammers are not going to rewrite their bots to target Bob's goldfish's web site unless Bob has a very popular goldfish.

In the case of where I worked, a malicious attacker would not need to rewrite anything. Anybody who had it in for us could simply DoS the form with infinite submissions.

login or register to post comments

An even simpler technique is a no-go: already cracked.

Submitted by notabene on April 14, 2008 - 11:03.

An even simpler technique, albeit less effective, is to create an empty textbox with the label "Leave this empty" and then style it to display: none so that most users wouldn't see it. On the backend, drop the submission if anything is in the box. Most of the spambots hitting us would fill in every textbox in the web page, so this cut spam down a great deal.

They already know how to do that. The first guy who programs a bot and finds out that they should exclude this field will do. Already did as some people found out (I heard it from fellow SPIP contributors).

login or register to post comments

The access keys for this page are: ALT (Control on a Mac) plus:

evolt.orgEvolt.org is an all-volunteer resource for web developers made up of a discussion list, a browser archive, and member-submitted articles. This article is the property of its author, please do not redistribute or use elsewhere without checking with the author.