Sunday, March 25, 2007

Bad-Bot Trap Revisited

There seems to be an increase of late in bad-bot activity with new ips, and new user-agents. So I thought I would add a couple of ideas to flesh out my original A Simple PHP based Bad-Bot Trap that seems to be rather popular.

I'm flattered that people are posting and collecting the links to this blog. But some scum are stealing the articles and posting them on their own sites: you who do so will receive instant and debilitating bad karma as a result. Furthermore, you do not have permission to do so. May you experience endless server and php errors.

Just link, its nicer. If you have comments or suggests, go for it.

I offer the following with the standard disclaimer: If you don't understand the code, don't use it!

We can notice that the bots tend to follow the links on a page in one of three fairly predictable ways: top down, alphabetically ascending, and alphabetically descending. If we wish to trap a bad-bot early on in its travels through our site, we can easily set traps for each possibility using the original bad-bot trap and a little .htaccess magic.

First add the following rules to the robots.txt under User-agent: *

Disallow: /afile.html
Disallow: /zfile.html
Disallow: /nofile.html

add to .htacess

# set 'RewriteEngine On' if you haven't already
# redirect badbots
RewriteRule ^afile.* /badbots.php [L]
RewriteRule ^zfile.* /badbots.php [L]
RewriteRule ^nofile.* /badbots.php [L]


Now we have three different traps to embed in our pages:

<p style="color:white;background:white;height:0;visibility:collapse;"
onclick="return false" >
<a href="/badbots.php" >.</a>

which can go at the top of the page

<p style="color:white;background:white;height:0;visibility:collapse;"
onclick="return false" >
<a href="/afile.html" >.</a>

and

<p style="color:white;background:white;height:0;visibility:collapse;"
onclick="return false" >
<a href="/zfile.html" >.</a>

which can go pretty much anywhere on a page.

The traps should be self-evident as to their use. The Disallow: /nofile.html exclusion was added for that particular species of bad-bot that uses the robots.txt to find links.

Happy Trapping!

No comments: