Note: The anti-form-spam technique discussed here is specifically for developers who use the Ruby on Rails framework. While the concepts are still relevant, you may have to research how to implement them using the architecture and languages your website uses.
Many blogs, such as ours here at Ameravant, allow visitors to contribute to the articles by posting comments. As part of that comment, a name, email address and comment itself are usually collected. However, if a blog post receives attention from humans, it will also start getting attention from spam bots and scrapers. This puts both your website content and your visitors at risk.
Spam bots (or scrapers) are scripts or programs that discover forms and email addresses posted on websites, and automatically uses them to distribute spam. Usually, the intention of the spam is to share addresses to websites that they want to direct traffic to (usually those dealing with medications like Viagra and other sexually-related drugs). If you've used the Internet for more than a couple months, you have definitely experienced spam.
If web forms are not properly protected, these bots will cause a headache for the website owner, requiring them to periodically delete the bogus comments. Your visitors who have left comments on your blog will also be exposed to having their email address scraped and later spammed directly.
It is important to protect both yourself and your visitors from obviously-unwanted spam. There are a few good techniques for this. First, let's talk about how to still display email addresses on your site, but keep them hidden from those annoying spammers.
Camouflage your visitors' email addresses
The hex encoding will output the email address entirely with HTML hex values for characters. Here is how you use it:
<%= mail_to "firstname.lastname@example.org", "Email John Smith", :encode => :hex %>
While the visitors to your site will see an actual email address and be able to copy it or click on it, this is what the source code will show:
<a href="mailto:%6a%6f%68%6e%5f%73%6d%69%74%68@%6d%61%69%6c%69%6e%61%74%6f%72.%63%6f%6d">Email John Smith</a>
And here is what the output looks like:
You've now done your part to protect your visitors from spam. If they know you are helping hide their email address from potential spammers, they may be more inclined to provide their address when posting a comment.
Protecting your own site against form spam
CAPTCHAs have become a very popular way to combat spam, but so many of them have become nearly too complicated looking for me to decipher at times. The technique I have always liked to use simply uses CSS to convince the spam bot that a form field exists that actually isn't shown to a real human visitor.
The concept is very simple. You put a field in your form that you actually don't expect to be used by human visitors. For example, on our blogs we only ask for name, email address and a comment. So, our fake field could be something like "company", "website", or "city". The spam bot inspects the HTML source code of the site and blindly provides input for every field of a web form that it finds an HTML tag for. Since it will submit our fake field, which we simply hide by using the CSS display:none setting, we can reject any comments that provide data for this field.
Since bots may be improved to look for a "style" attribute in your HTML input tags, you should put the CSS display:none setting inside an externally-included CSS file, as opposed to directly in the source code of the page that the form is on.
Following these two techniques is a great start to fighting against spam, protecting both yourself and your visitors from some headaches.