Reduce Spam to almost Zero by simple Blocking
Spambots and badly behaving bots seem to be all the rage this year. While it will be very hard to block all of them, you can do a lot to keep most of the comment spammers away from your blog and scrapers from harvesting your site.
Spambots also have to identify as user agents on servers, so that people gathered their “names” in the long run. There are a bunch more, but blocking to many may cause the server to slow down. Here we go. Just follow this simple steps to reduce your spam attacks!
Image by debruehe
How to Block SpamBots
- Create a .htaccess file in your root directory on your server if you don’t have one already.
- Insert the following code
# BLACKLISTED USER AGENTS SetEnvIfNoCase User-Agent "" keep_out SetEnvIfNoCase User-Agent "larbin" keep_out SetEnvIfNoCase User-Agent "heritrix" keep_out SetEnvIfNoCase User-Agent "ia_archiver" keep_out SetEnvIfNoCase User-Agent "Jakarta Commons" keep_out SetEnvIfNoCase User-Agent "Y!OASIS/TEST" keep_out SetEnvIfNoCase User-Agent "libwww-perl" keep_out SetEnvIfNoCase User-Agent "MOT-MPx220" keep_out SetEnvIfNoCase User-Agent "MJ12bot" keep_out SetEnvIfNoCase User-Agent "Nutch" keep_out SetEnvIfNoCase User-Agent "cr4nk" keep_out <Limit GET POST PUT> order allow,deny allow from all deny from env=keep_out </Limit>
Thats it! Save and upload it to your (apache) server. What do we have done? Every single line stands for blocking a spam bot (like “larbin”), so you maybe noticed the very first line with no spambot in it. ""? That’s right. An empty user agent. If someone can’t be arsed to set a user-agent, why should you serve him anything? Some servers can not handle the empty “”. So if you encounter problems, delete the hole line
SetEnvIfNoCase User-Agent "" keep_out
The last 5 lines just tell the server to block those agents.
Please keep us updated of your experiences or if you have different spambots!

