Use with care to make sure you don't shut out visitors that you do want, or worse, shut down your site.
If you don't understand the following code, don't use it!
P.S. you can use it but you cannot post it elsewhere. Copyright 2006
What you need:
- PHP enabled site
- Ability to incorporate robots.txt
- Ability to incorporate .htaccess files on your site
- Ability to send email via PHP
- Stamina to monitor your logs and .htaccess file
robots.txt
.htaccess
badbots.php
bad-bots-script.php
index.php (or index.html)
- Add the following lines (or appropriate version) to your robots.txt:
User-agent: *
Disallow: /badbots.php - Create the following file: badbots.php
<?php
header("Content-type: text/html; charset=utf-8");
echo '
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
';
?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Bad-Bots and Rippers Denied</title>
<meta name="author" content="seven-3-five.blogspot.com 2006-09-04" />
</head>
<body>
<p>whatever message you would like the scum to see</p>
<?php
include 'bad-bot-script.php';
?>
</body>
</html> - Create the following file: bad-bot-script.php
<?php
// author: seven-3-five, 2006-09-04, seven-3-five.blogspot.com
//this script is the meat and potatoes of the bot-trap
// 1. It sends you an email when the page /badbots.php is visited.
//The email contains various info about the visitor.
//2. It adds the directive
//'deny from $ip' ($ip being the visitor's ip address)
//to the bottom of your .htaccess file.
// SERVER VARIABLES USED TO IDENTIFY THE OFFENDING BOT
$ip = $_SERVER['REMOTE_ADDR'];
$agent = $_SERVER['HTTP_USER_AGENT'];
$request = $_SERVER['REQUEST_URI'];
$referer = $_SERVER['HTTP_REFERER'];
// CONSTRUCT THE EMAIL MESSAGE
$subject = 'bad-bots';
$email = 'your_email@your_site.com'; //edit accordingly
$to = $email;
$message ='ip: ' . $ip . "\r\n" .
'user-agent string: ' . $agent . "\r\n" .
'requested url: ' . $request . "\r\n" .
'referer: ' . $referer . "\r\n"; // often is blank
$message = wordwrap($message, 70);
$headers = 'From: ' . $email . "\r\n" .
'Reply-To: ' . $email . "\r\n" .
'X-Mailer PHP/' . phpversion();
// SEND THE MESSAGE
mail($to, $subject, $message, $headers);
// ADD 'deny from $ip' TO THE BOTTOM OF YOUR MAIN .htaccess FILE
$text = 'deny from ' . $ip . "\n";
$file = '.htaccess';
add_badbot($text, $file);
// Function add_bad_bot($text, $file_name): appends $text to $file_name
// make sure PHP has permission to write to $file_name
function add_badbot($text, $file_name) {
$handle = fopen($file_name, 'a');
fwrite($handle, $text);
fclose($handle);
}
?> - Add the following html soon after the <body> tag of your main /index.php (or index.html) page:
<p style="color:white;background:white;height:0;visibility:collapse;">
<a href="badbots.php" >.</a>
</p> - Test thoroughly
- You may have to create an empty.htaccess file if your site does not already have one
- You may have to adjust the permissions for .htaccess so that bad-bot-script.php can write to it. If so, try:
touch .htaccess
chgrp www .htaccess
chmod 664 .htaccess - Your mail server may not like PHP generated mail
- Your mail server may need to be configured
- If you are locking out everyone- try adding the following two lines near the top of your main .htaccess file:
order allow,deny
though they should appear in the httpd.conf file on any public Apache Server and shouldn't be necessary here (I believe...)
allow from all - You have locked yourself out -this will happen every time you test the system, so be prepared to remove your ip from your .htaccess file.
- Your test adds your ip to the .htaccess file, but you still have access - your server may not be configured to allow use of .htaccess files.
What Happens:
A bad-bot grabs your robots.txt file and either ignores the file's directives, or uses that information to find stuff. If the bot follows the link to /badbots.php, the bad-bot-script.php fires, writing the visitor's ip to your .htaccess file and sending you an email to the fact. The bad-bot can no longer transverse your site.
Alternately a ripper visits your site and starts to download everything it can find. It will quickly stumble upon the /badbots.php link on your /index.*. Once visiting /badbots.php, it will be unable to download any more of your stuff, just like in the previous example.
Once the bot or ripper discovers it is locked out, it may thrash about a bit, trying to retrieve any largish file it may have an url for, but of course it will just be denied access, getting a 403 code and nothing else, and quickly move on.
Variations: endless
10 comments:
This trap is good and easy but when I'm going to badbots.php, I've this error message :
Fatal error: Call to unsupported or undefined function wordwrap() in bad-bot-script.php on line 27
What to do ?
Jdy
Jdy's error
call to unsupported function wordwrap() ...
The function wordwrap() should be available starting with PHP 4.0.2. It may be disabled on public servers due to a potential heap-based buffer overflow problem. Check disable_functions in your php.ini file or with phpinfo(INFO_CONFIGURATION); to see if that is the case.
The problem was addressed in PHP 4.4.3 and 5.1.3.
See secunia.com/advisories/19803/ for details.
The statement
$message = wordwrap($message, 70);
is purely aesthetic and not strictly necessary for the script to work.
You could just comment it out or even remove it.
If you really want a wordwrap function, but won't or can't use the included wordwrap(), I believe there are some alternative scripts at php.net, search wordwrap.
If you want to flush out the original bad-bot-script, perhaps write checks that limit/sanitize the $_SERVER variables before constructing $message, and so on.
This script has proved to be suprisingly popular. It seems to be popping up on various php sites all over the place.
Most posters have provided links back here, thanks, but have also quoted the post in full. As a courtesy, perhaps you can just provide a link to here with a short description, and please make sure the link works! You could also let me know about your post, rather than me finding out through Google.
If you are using the trap and are having success (or not) with it let me know here and I will publish your comments.
Also if someone wants to translate the script's commentary into another language, (already in french), you can add the translation as a comment.
Thanks
This is a great tip and has solved a problem for me so thankyou for that!
The only thing I would add is to make sure you change the filenames because as this script becomes more commonly used, the scumbags will just configure their scanner to ignore any link to badbots.php.
This is great and thank you.
It would be even better if you could integrate somehow a reverse dns look up on true search engine bots not spoofing.
Then it woul be a gun system
This script blocks Google's bot!!
----------
After implementing this, I got the following message:
ip: 66.249.72.147 user-agent string: Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html) requested url: /badbots.php
How do I UNBLOCK wanted bots???
1. Go into your .htaccess file and remove the line blocking google's ip
2. Make sure you robots.txt exclusion file is configured properly as described in the text above.
Combining php, htacess and robots.txt can create powerful tools. Probably better not to use them on live sites until you are comfortable with these tools, have tested them thoroughly, and fully understand the code/markup.
regards,
7
What do you do about crawlers that have cached versions of your robots.txt file at the time you implement the bot trap. If they do not reload the file some okay bots may fall victim to the trap?
It's an awesome script, but the it shouldn't be bad-bots-script.php it should be bad-bot-script.php
Post a Comment