Preventing bot form submission

Mike picture Mike · Mar 10, 2013 · Viewed 47.4k times · Source

I'm trying to figure out a good way to prevent bots from submitting my form, while keeping the process simple. I've read several great ideas, but I thought about adding a confirm option when the form is submitted. The user clicks submit and a Javascript confirm prompt pops up which requires user interaction.

Would this prevent bots or could a bot figure this out too easy? Below is the code and JSFIddle to demonstrate my idea:

JSFIDDLE

$('button').click(function () {
  if(Confirm()) {
    alert('Form submitted');
    /* perform a $.post() to php */
  }
  else {
    alert('Form not submitted');
  }
});

function Confirm() {
  var _question = confirm('Are you sure about this?');
  var _response = (_question) ? true : false;
  return _response;
}

Answer

Patrick M picture Patrick M · Mar 10, 2013

This is one problem that a lot of people have encountered. As user166390 points out in the comments, the bot can just submit information directly to the server, bypassing the javascript (see simple utilities like cURL and Postman). Many bots are capable of consuming and interacting with the javascript now. Hari krishnan points out the use of captcha, the most prevalent and successful of which (to my knowledge) is reCaptcha. But captchas have their problems and are discouraged by the World-Wide Web compendium, mostly for reasons of ineffectiveness and inaccessibility.

And lest we forget, an attacker can always deploy human intelligence to defeat a captcha. There are stories of attackers paying for people to crack captchas for spamming purposes without the workers realizing they're participating in illegal activities. Amazon offers a service called Mechanical Turk that tackles things like this. Amazon would strenuously object if you were to use their service for malicious purposes, and it has the downside of costing money and creating a paper trail. However, there are more erhm providers out there who would harbor no such objections.

So what can you do?

My favorite mechanism is a hidden checkbox. Make it have a label like 'Do you agree to the terms and conditions of using our services?' perhaps even with a link to some serious looking terms. But you default it to unchecked and hide it through css: position it off page, put it in a container with a zero height or zero width, position a div over top of it with a higher z-index. Roll your own mechanism here and be creative.

The secret is that no human will see the checkbox, but most bots fill forms by inspecting the page and manipulating it directly, not through actual vision. Therefore, any form that comes in with that checkbox value set allows you to know it wasn't filled by a human. This technique is called a bot trap. The rule of thumb for the type of auto-form filling bots is that if a human has to intercede to overcome an individual site, then they've lost all the money (in the form of their time) they would have made by spreading their spam advertisements.

(The previous rule of thumb assumes you're protecting a forum or comment form. If actual money or personal information is on the line, then you need more security than just one heuristic. This is still security through obscurity, it just turns out that obscurity is enough to protect you from casual, scripted attacks. Don't deceive yourself into thinking this secures your website against all attacks.)

The other half of the secret is keeping it. Do not alter the response in any way if the box is checked. Show the same confirmation, thank you, or whatever message or page afterwards. That will prevent the bot from knowing it has been rejected.

I am also a fan of the timing method. You have to implement it entirely on the server side. Track the time the page was served in a persistent way (essentially the session) and compare it against the time the form submission comes in. This prevents forgery or even letting the bot know it's being timed - if you make the served time a part of the form or javascript, then you've let them know you're on to them, inviting a more sophisticated approach.

Again though, just silently discard the request while serving the same thank you page (or introduce a delay in responding to the spam form, if you want to be vindictive - this may not keep them from overwhelming your server and it may even let them overwhelm you faster, by keeping more connections open longer. At that point, you need a hardware solution, a firewall on a load balancer setup).

There are a lot of resources out there about delaying server responses to slow down attackers, frequently in the form of brute-force password attempts. This IT Security question looks like a good starting point.

Update regarding Captcha's

I had been thinking about updating this question for a while regarding the topic of computer vision and form submission. An article surfaced recently that pointed me to this blog post by Steve Hickson, a computer vision enthusiast. Snapchat (apparently some social media platform? I've never used it, feeling older every day...) launched a new captcha-like system where you have to identify pictures (cartoons, really) which contain a ghost. Steve proved that this doesn't verify squat about the submitter, because in typical fashion, computers are better and faster at identifying this simple type of image.

It's not hard to imagine extending a similar approach to other Captcha types. I did a search and found these links interesting as well:

Is reCaptcha broken?
Practical, non-image based Captchas
If we know CAPTCHA can be beat, why are we still using them?
Is there a true alternative to using CAPTCHA images?
How a trio of Hackers brought Google's reCaptcha to its knees - extra interesting because it is about the audio Captchas.

Oh, and we'd hardly be complete without an obligatory XKCD comic.