BestWeb Nashville

Form Spam, Prevention

penitentes, an ice phenomenon - nashville web services

Why Form Spam?

A good question we often receive is the general wonder of why – why form spam? Why would a person (or an automated process) care about submitting a bunch of junk on whatever web forms they can find?

To understand the answer, one should consider email spam, for both types of spam are perpetrated for similar reasons. The goals behind email spam are obvious to anyone who has considered it: to increase traffic to their websites, to phish for personal and/or financial information, to install subversive code on the victim’s computer, and so on.

Form spam has the same goals in general; they just go about it in a slightly different way. Consider search engine results: one of the best ways to increase the visibility of a given website is to increase the number of backlinks, or the number of links on other websites to the given website. Since form results are often posted onto the related website (think blogs), form spam accomplishes this. Like email spam, it is a numbers game – which explains why your web form is being targeted (like an email address: because it exists).

Form Spam: An Ongoing Topic of Conversation

There has been much discussion in the last few years about how to prevent spambots from submitting forms on web sites. I expect this conversation to continue, because the problem of spam does not appear to be a short-term one. Many different solutions have been presented ranging from the simple to the complex. A number of the anti-form spam solutions actually impact the usability and accessibility of the web page; for example, the use of CAPTCHAs (or re-CAPTCHAs) is a classic case where the user and accessibility is directly impacted.

Here are a few common techniques and terms relating to the prevention of form spam that website owners should be aware of.

Hidden Field

One of the simplest ways to avoid form spam is using CSS. Non-human spammers usually fill out every available input field in a form before submitting it. The basic idea here is to create an extra field on your form, actually a dummy input text field -- and then use CSS to make the form field invisible to your human website visitors. That way, if the form is filled out and this dummy field contains any information at all, you can safely bet it is form spam, and it can be accordingly trashed.

The downside to such a simple method is that many more sophisticated form spambots can tell this is a hidden field and thus “know” to avoid it. In fact, this was probably one of the first methods form spammers learned how to automatically circumvent. In our opinion, however, it is still much better than using NO form spam avoidance techniques on a given form.

Turing Tests

The “Turing test” is a proposal for a test of a machine's ability to demonstrate intelligence. Described by Alan Turing in the 1950 paper "Computing Machinery and Intelligence," it proceeds as follows: a human judge engages in a natural language conversation with one human and one machine, each of which tries to appear human. All participants are placed in isolated locations. If the judge cannot reliably tell the machine from the human, the machine is said to have passed the test. In order to test the machine's intelligence rather than its ability to render words into audio, the conversation is limited to a text-only channel such as a computer keyboard and screen.

CAPTCHA Code

The term "CAPTCHA" was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper (all of Carnegie Mellon University), and John Langford (then of IBM). It is a contrived acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart." Carnegie Mellon University attempted to trademark the term, but the trademark application was abandoned on 21 April 2008. Currently, CAPTCHA creators recommend use of reCAPTCHA as the official implementation.

ReCAPTCHA Code

reCAPTCHA is a system developed at Carnegie Mellon University that uses CAPTCHA to help digitize the text of books whilst protecting websites from bots attempting to access restricted areas. reCAPTCHA is currently digitizing text from the Internet Archive and the archives of the New York Times.

reCAPTCHA supplies subscribing websites with images of words that optical character recognition (OCR) software has been unable to read. The subscribing websites (whose purposes are generally unrelated to the book digitization project) present these images for humans to decipher as CAPTCHA words, as part of their normal validation procedures. They then return the results to the reCAPTCHA service, which sends the results to the digitization projects. This provides about the equivalent of 160 books per day, or 12,000 man-hours per day of free labor (as of September 2008).

Resources: Turing Test Methods for PHP Forms

Wikipedia – Turing Test - http://en.wikipedia.org/wiki/Turing_test

Resources: Form Spam, General

http://klauskjeldsen.dk/2007/07/19/avoid-html-form-spam-using-css/
http://webaim.org/blog/spam_free_accessible_forms/