A very simple bad-bot trap that catches both bots that ignore your
robots.txt, and site rippers who don't read the
robots.txt. There are many versions of this trap out there. This one is not particularly sophisticated, but it works.
Use with care to make sure you don't shut out visitors that you do want, or worse, shut down your site.
If you don't understand the following code, don't use it!
P.S. you can use it but you cannot post it elsewhere. Copyright 2006
What you need:
- PHP enabled site
- Ability to incorporate robots.txt
- Ability to incorporate .htaccess files on your site
- Ability to send email via PHP
- Stamina to monitor your logs and .htaccess file
The following files are created or edited under your .html directory, typically
public_html/ on shared web hosting services.
robots.txt
.htaccess
badbots.php
bad-bots-script.php
index.php (or index.html)
- Add the following lines (or appropriate version) to your robots.txt:
User-agent: *
Disallow: /badbots.php
- Create the following file: badbots.php
<?php
header("Content-type: text/html; charset=utf-8");
echo '
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
';
?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Bad-Bots and Rippers Denied</title>
<meta name="author" content="seven-3-five.blogspot.com 2006-09-04" />
</head>
<body>
<p>whatever message you would like the scum to see</p>
<?php
include 'bad-bot-script.php';
?>
</body>
</html>
- Create the following file: bad-bot-script.php
<?php
// author: seven-3-five, 2006-09-04, seven-3-five.blogspot.com
//this script is the meat and potatoes of the bot-trap
// 1. It sends you an email when the page /badbots.php is visited.
//The email contains various info about the visitor.
//2. It adds the directive
//'deny from $ip' ($ip being the visitor's ip address)
//to the bottom of your .htaccess file.
// SERVER VARIABLES USED TO IDENTIFY THE OFFENDING BOT
$ip = $_SERVER['REMOTE_ADDR'];
$agent = $_SERVER['HTTP_USER_AGENT'];
$request = $_SERVER['REQUEST_URI'];
$referer = $_SERVER['HTTP_REFERER'];
// CONSTRUCT THE EMAIL MESSAGE
$subject = 'bad-bots';
$email = 'your_email@your_site.com'; //edit accordingly
$to = $email;
$message ='ip: ' . $ip . "\r\n" .
'user-agent string: ' . $agent . "\r\n" .
'requested url: ' . $request . "\r\n" .
'referer: ' . $referer . "\r\n"; // often is blank
$message = wordwrap($message, 70);
$headers = 'From: ' . $email . "\r\n" .
'Reply-To: ' . $email . "\r\n" .
'X-Mailer PHP/' . phpversion();
// SEND THE MESSAGE
mail($to, $subject, $message, $headers);
// ADD 'deny from $ip' TO THE BOTTOM OF YOUR MAIN .htaccess FILE
$text = 'deny from ' . $ip . "\n";
$file = '.htaccess';
add_badbot($text, $file);
// Function add_bad_bot($text, $file_name): appends $text to $file_name
// make sure PHP has permission to write to $file_name
function add_badbot($text, $file_name) {
$handle = fopen($file_name, 'a');
fwrite($handle, $text);
fclose($handle);
}
?>
- Add the following html soon after the <body> tag of your main /index.php (or index.html) page:
<p style="color:white;background:white;height:0;visibility:collapse;">
<a href="badbots.php" >.</a>
</p>
- Test thoroughly
Possible issues depending on your server ...
What Happens:
A bad-bot grabs your robots.txt file and either ignores the file's directives, or uses that information to find stuff. If the bot follows the link to /badbots.php, the bad-bot-script.php fires, writing the visitor's ip to your .htaccess file and sending you an email to the fact. The bad-bot can no longer transverse your site.
Alternately a ripper visits your site and starts to download everything it can find. It will quickly stumble upon the /badbots.php link on your /index.*. Once visiting /badbots.php, it will be unable to download any more of your stuff, just like in the previous example.
Once the bot or ripper discovers it is locked out, it may thrash about a bit, trying to retrieve any largish file it may have an url for, but of course it will just be denied access, getting a 403 code and nothing else, and quickly move on.
Variations: endless