Thursday, September 21, 2006

Setting Bad-Bot Traps

In A Simple PHP Based Bot Trap, I presented a fairly simple script for trapping, well really excluding, robots or site rippers that either ignore or surreptitiously use a site's robots.txt file.

The trap was set by adding a link to an excluded file (via robots.txt) on the main index page as bait for the bad-bot. The trap is hidden from the regular user through CSS styles and by using a dot (.) in-between the anchor tags.

The weakness in this trap's method of camouflage, is that the trap could still be tripped by users of text or non-visual browsers.

There are two variations on this trap that can be used in conjunction with or instead of the original.

The first is simply replacing the dot with an image file, 1px by 1 px, that is the same colour as the page's background. Use of the alt attribute can help identify the trap to legitimate users of text-browsers. For example:

<p>
<a href="/badbots.php">
<img src="small_image.gif" alt="do not follow" />
</a>
</p>


The second variation is to use html comments, hiding the link from everybody except that particular species of bot that will try to follow anything that even remotely resembles a link:

<!-- <a href="/badbots.php">look, a link!</a> -->

Placement of the traps can vary also. Bot's do not necessarily follow links in the order found on the page, nor do they necessarily enter a site through the main index page. Traps can be placed soon after the <body> tag, near the bottom of the page, or within a list of links, such as a navigation bar. If the site has a complicated hierarchy of nested folders, laying traps at different depths may also yield results.

No comments: