Saturday, December 16, 2006
BadVista.org
Thursday, October 19, 2006
MSIE 7 More Security Problems
Best to use the better browsers like Firefox and Opera.
Monday, October 09, 2006
BBC HoneyPot
The weaknesses in his article are explored on slashdot.org, so I won't rehearse them here.
Perhaps more interesting is the BBC/Microsoft memorandum of understanding "that aims to identify 'common interests' between the BBC and Microsoft. Areas for collaboration include search and navigation, distribution, and content enablement."
To purely speculate the relationship between BBC tech articles and the MS/BBC agreement:
Microsoft is going to have a hard time selling its upcoming release of the Vistas system, specifically, getting users of XP to upgrade, and to return ex-Microsoft users to the fold (for example all the college kids that bought new Apple laptops this year). MS will probably market the new system's "security" features as a main selling point.
Articles like the one produced by the BBC, that begin to explore the all too well known security problems in current Microsoft software, help prepare the marketplace for a new "secure" system, and condition consumers to see security as a need. The new Vistas OS will then present itself as the only viable solution to the problem.
Again, pure speculation. Nevertheless, when visiting the Vistas site on microsoft.com, there rarely is a page that does not mention security in some context. BBC articles on computer technology focus very heavily on the MS OS, almost to the exclusion of others.
Saturday, October 07, 2006
PSI/Cogent yet again
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)
Its getting tempting to just block everything in the range from
38.0.0.0 to 38.255.255.255, because the only visitors I've ever seen from this range are scum spam bots. Typical user-agents include Snapbot, voyager, cfetch, Java, as well as MSIE poser bots. They always run into a trap though, as the one listed here, and that keeps it fun.
Whois:
OrgName: Performance Systems International Inc.
OrgID: PSI
Address: 1015 31st St NW
City: Washington
StateProv: DC
PostalCode: 20007
Country: US
The New iPOD WOW!! (or not)
Tetris!!
Sorry Steve, iPod has officially lost its cool.
Perhaps the movie business will save ya.
Tuesday, October 03, 2006
Nusearch Spider
Host: 84.9.136.223
The Nusearch Spider dropped by for a visit. It only followed the first ten top-level html links. It was not interested in going to other directories, and even tried to load directory-names as files by dropping the trailing "/" then ignoring the resulting redirect.
The Spider obeyed some of the directives in the robots.txt, but not all. My guess its a configuration issue with the bot at this time. We will be watching them to see what's up.
I dropped by their site, nusearch.com and its yet another search engine promising to be better than the other guys. Ya whatever, but they need to get their bot under better control, and clean up it's blacklist status if they want us to allow them to crawl our sites.
Saturday, September 30, 2006
Missigua Spam Bot
Agent: Missigua Locator 1.9
This bot reads links from top to the bottom on the page. After visiting the document root on a site, it quickly ran into a bot trap, actually the first link it followed. It then tried all the links on the page, but of course, was denied access. It did not read the robots.txt before trying to crawl the site.
Whois Record
OrgName: ThePlanet.com Internet Services, Inc.
OrgID: TPCMAddress: 1333 North Stemmons Freeway
Address: Suite 110
City: Dallas
StateProv: TX
PostalCode: 75207
Country: US
Thursday, September 21, 2006
Setting Bad-Bot Traps
The trap was set by adding a link to an excluded file (via robots.txt) on the main index page as bait for the bad-bot. The trap is hidden from the regular user through CSS styles and by using a dot (.) in-between the anchor tags.
The weakness in this trap's method of camouflage, is that the trap could still be tripped by users of text or non-visual browsers.
There are two variations on this trap that can be used in conjunction with or instead of the original.
The first is simply replacing the dot with an image file, 1px by 1 px, that is the same colour as the page's background. Use of the
alt
attribute can help identify the trap to legitimate users of text-browsers. For example:
<p>
<a href="/badbots.php">
<img src="small_image.gif" alt="do not follow" />
</a>
</p>
The second variation is to use html comments, hiding the link from everybody except that particular species of bot that will try to follow anything that even remotely resembles a link:
<!-- <a href="/badbots.php">look, a link!</a> -->
Placement of the traps can vary also. Bot's do not necessarily follow links in the order found on the page, nor do they necessarily enter a site through the main index page. Traps can be placed soon after the
<body>
tag, near the bottom of the page, or within a list of links, such as a navigation bar. If the site has a complicated hierarchy of nested folders, laying traps at different depths may also yield results.
Tuesday, September 19, 2006
Performance Systems International Inc. Bot
Host: 38.99.203.110
Whois Record
OrgName: Performance Systems International Inc.
OrgID: PSI
Address: 1015 31st St NW
City: Washington
StateProv: DC
PostalCode: 20007
Country: US
This bot visited an unprotected site last July. It grabbed the robots.txt and then proceeded to download every link on the site, including javascript files.
It is now blocked by ip and user-agent (^Java).
It has since revisited the site twice. Today it tried to grab robots.txt and was sent a 403 code (denied access) as Java/1.6.0-beta2. It then changed its user-agent string, four second delay, to Mozilla/5.0 (compatible; MSIE 6.0; Windows NT 5.0) and tried to grab
the main index page
. Again denied, it seems to have moved on.It appears to be hosted by Cogent Communications.
Saturday, September 16, 2006
Un moyen élégant de piéger les robots indélicats
Une trés simple trappe à mauvais robots qui piège à la fois les robots qui ignorent robots.txt et aux aspirateurs de site qui ne lisent pas robots.txt.
Il existe de nombreuses versions de cette trappe. celle-ci n'est pas particulièrement sophistiquée, mais elle marche.
Utilisez-la avec prudence pour être certain de ne pas éjecter des visiteurs souhaités, ou, pire, de ne pas planter votre site.
Si vous ne comprenez pas le code ci-dessous, ne l'utilisez pas.
Requis:
- Hébergement acceptant le PHP
- Capacité d'incorporer robots.txt
- Capacité d'incorporer .htaccess sur votre site
- Capacité d'envoyer des emails via PHP
- Stamina to monitor your logs and .htaccess file
- robots.txt
- .htaccess
- badbots.php
- bad-bots-script.php
- index.php (ou index.html)
User-agent: *
Disallow: /badbots.php
2. Créez le fichier suivant: badbots.php
<?php
header("Content-type: text/html; charset=utf-8");
echo ' <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> ';
?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Bad-Bots and Rippers Denied</title>
<meta name="author" content="seven-3-five.blogspot.com 2006-09-04" />
</head>
<body>
<p>whatever message you would like the scum to see</p>
<?php
include 'bad-bot-script.php';
?>
</body>
</html>
3. Créez le fichier suivant: bad-bot-script.php
<?php
/* author: seven-3-five, 2006-09-04, seven-3-five.blogspot.com
* Merci Spitfire pour la tranduction
à français
*Ce script est le plat de résistance de ce piège à robots
* 1. Il vous envoie un email quand la page /badbots.php est visité.
* L'email contient diverses infos sur le visiteur
* 2. Il ajoute la directive
* 'deny from $ip' ($ip étant l'adresse ip du visiteur)
* à la fin de votre fichier .htaccess */
/* VARIABLES SERVEUR UTILISEES
* POUR IDENTIFIER LE ROBOT ATTAQUANT */
$ip = $_SERVER['REMOTE_ADDR'];
$agent = $_SERVER['HTTP_USER_AGENT'];
$request = $_SERVER['REQUEST_URI'];
$referer = $_SERVER['HTTP_REFERER'];
// CONSTRUIT LE MESSAGE DE L'EMAIL
$subject = 'bad-bots';
$email = 'your_email@your_site.com';
$to = $email;
$message ='ip: ' . $ip . "\r\n" .
'user-agent string: ' . $agent . "\r\n" .
'requested url: ' . $request . "\r\n" .
'referer: ' . $referer . "\r\n";
// referer souvent une page blanche
$message = wordwrap($message, 70);
$headers = 'From: ' . $email . "\r\n" .
'Reply-To: ' . $email . "\r\n" .
'X-Mailer PHP/' . phpversion();
// ENVOIE LE MESSAGE
mail($to, $subject, $message, $headers);
/* AJOUTE 'deny from $ip'
* A LA FIN DE VOTRE FICJIER .htaccess */
$text = 'deny from ' . $ip . "\n";
$file = '.htaccess';
add_badbot($text, $file);
/* Function
* add_bad_bot($text, $file_name): appends $text to $file_name
* Vérifiez que PHP a la permission d'écrire dans $file_name */
function add_badbot($text, $file_name)
{
$handle = fopen($file_name, 'a');
fwrite($handle, $text);
fclose($handle);
}
?>
4. Ajoutez le code suivant, après le tag <body> de votre page d'index, index.php ou index.html:
<p style="color:white;background:white;height:0;visibility:collapse;">
<a href="badbots.php" >.</a>
</p>
5. Testez-le complètement
Que se passera-t-il?
Un vilain robot parcourt le fichier robots.txt et ignore les directives ou utilise cette information. Si le robot suit le lien vers /badbots.php, alors le script bad-bot-script.php se déclenche, écrit l'adresse IP du visiteur dans votre fichier .htaccess et vous signale le fait par email. Le vilain robot ne pourra plus parcourir le site.
Autre possibilité: un aspirateur de sites visite votre site et commence par télécharger tout ce qu'il trouve. Il tombera rapidement sur le lien /badbots.php de votre page d'index. Une fois visité ce lien, il ne pourra plus rien téklécharger d'autre, comme dans l'exemple précédent.
Incidents possibles, dépendant de votre serveur:
- Vous aurez peut-être à créer un .htaccess vide si votre site n'en a pas déja un
- Vous aurez peut-être à paramétrer les permissions de .htaccess afin que bad-bot-script.php puisse y écrire. si oui, essayez:
touch .htaccess
chgrp www .htaccess
chmod 664 .htaccess - Votre serveur de mails peut ne pas accepter les mails générés par PHP
- Il a peut-être besoin d'être configuré
- Si vous avez éjecté tout le mode, essayez d'ajouter les lignes suivantes au début de votre vfichier .htaccess
order allow,deny
allow from all
bien qu'elles devraient être présentes dans le fichier httpd.conf de tout serveur apache public et neseront sans doute pas nécessaires ici (du moins je pense) - Vous vous êtes éjecté vous même: cela arrivera chaque fois que vous testerez le système, aussi soyez préparé à enlever votre adresse IP de votre fichier .htaccess
- Vos tests ajoutent votre adresse IP au fichier .htaccess, mais vous n'êtes pas éjecté: votre serveur n'accepte sans doute pas l'utilisation des fichiers .htaccess
Friday, September 15, 2006
Probable Spam-Bot 59.26.150.110
Agent: - (empty string)
visited this week. Note the lack of a user-agent string. Initial searches suggest that this is a spam-bot trolling for email addresses.
Friday, September 08, 2006
Another Bad-Bot Falls into Trap
Host: 63.100.163.70
This bot, disguised as MSIE tried to rip through one of my sites, and ran right into a bot trap. It started off looking like a regular browser: it loaded the site's root index and the page's .css. It didn't load the robots.txt.
Nevertheless, several things in combination gave it away as a disrespectful bot or ripper:
- It wasn't loading the page's associated binaries, i.e. images and so on
- It wasn't loading javascript (everyone knows that MSIE can't do much without it)
- It was crawling pages at three pages per second
- On closer inspection, it only loaded one of the two .css files on the index page
- It tried to follow links that were commented out in the page's mark-up
- It ran into a bot trap that a normal user wouldn't see.
Whois says 63.100.163.70 belongs to:
UUNET Technologies, Inc.
22001 Loudoun County Parkway
Ashburn, VA, 20147, US
Spammer Jeremy Jaynes' Conviction Upheld
The N. Carolina man was originally convicted of illegally flooding A.O.L. customers with bulk email ads.
Jaynes is out on a million dollar bond, but the Virginia Attorney General's office would like the bond revoked so that Jaynes can start to serve his sentence.
Let's hope so.
Tuesday, September 05, 2006
My Favorite Emacs Feature
Most web hosting companies do not give their clients access to a command line interface, i.e. ssh access, without them buying one of the more expensive packages, something I cannot afford. The basic packages do, however, usually allow ftp access. And there it is. Emacs can use ftp! No more crappy control panels and crappier online text editors!
All one does is open a buffer in the usual way:
C-x C-f
The file path needs then to begin with the ftp account in the format:
/user_name@your_site.com: followed by:
path/to/your/file
The complete minibuffer looks something like:
/user_name@your_site.com:path/to/your/file
After pressing the enter key, you will be prompted for a password if needed, and you're off to the races. Editing files on the remote location is now seamlessly integrated into your current Emacs session.
FYI, Emacs is using 'ange-FTP,' which keeps most of the ftp stuff hidden in the background.
Way too easy!!
For some documentation, try in Emacs:
C-h i
m Emacs
m Files
m Remote Files
Monday, September 04, 2006
A Simple PHP based Bad-Bot Trap
Use with care to make sure you don't shut out visitors that you do want, or worse, shut down your site.
If you don't understand the following code, don't use it!
P.S. you can use it but you cannot post it elsewhere. Copyright 2006
What you need:
- PHP enabled site
- Ability to incorporate robots.txt
- Ability to incorporate .htaccess files on your site
- Ability to send email via PHP
- Stamina to monitor your logs and .htaccess file
robots.txt
.htaccess
badbots.php
bad-bots-script.php
index.php (or index.html)
- Add the following lines (or appropriate version) to your robots.txt:
User-agent: *
Disallow: /badbots.php - Create the following file: badbots.php
<?php
header("Content-type: text/html; charset=utf-8");
echo '
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
';
?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Bad-Bots and Rippers Denied</title>
<meta name="author" content="seven-3-five.blogspot.com 2006-09-04" />
</head>
<body>
<p>whatever message you would like the scum to see</p>
<?php
include 'bad-bot-script.php';
?>
</body>
</html> - Create the following file: bad-bot-script.php
<?php
// author: seven-3-five, 2006-09-04, seven-3-five.blogspot.com
//this script is the meat and potatoes of the bot-trap
// 1. It sends you an email when the page /badbots.php is visited.
//The email contains various info about the visitor.
//2. It adds the directive
//'deny from $ip' ($ip being the visitor's ip address)
//to the bottom of your .htaccess file.
// SERVER VARIABLES USED TO IDENTIFY THE OFFENDING BOT
$ip = $_SERVER['REMOTE_ADDR'];
$agent = $_SERVER['HTTP_USER_AGENT'];
$request = $_SERVER['REQUEST_URI'];
$referer = $_SERVER['HTTP_REFERER'];
// CONSTRUCT THE EMAIL MESSAGE
$subject = 'bad-bots';
$email = 'your_email@your_site.com'; //edit accordingly
$to = $email;
$message ='ip: ' . $ip . "\r\n" .
'user-agent string: ' . $agent . "\r\n" .
'requested url: ' . $request . "\r\n" .
'referer: ' . $referer . "\r\n"; // often is blank
$message = wordwrap($message, 70);
$headers = 'From: ' . $email . "\r\n" .
'Reply-To: ' . $email . "\r\n" .
'X-Mailer PHP/' . phpversion();
// SEND THE MESSAGE
mail($to, $subject, $message, $headers);
// ADD 'deny from $ip' TO THE BOTTOM OF YOUR MAIN .htaccess FILE
$text = 'deny from ' . $ip . "\n";
$file = '.htaccess';
add_badbot($text, $file);
// Function add_bad_bot($text, $file_name): appends $text to $file_name
// make sure PHP has permission to write to $file_name
function add_badbot($text, $file_name) {
$handle = fopen($file_name, 'a');
fwrite($handle, $text);
fclose($handle);
}
?> - Add the following html soon after the <body> tag of your main /index.php (or index.html) page:
<p style="color:white;background:white;height:0;visibility:collapse;">
<a href="badbots.php" >.</a>
</p> - Test thoroughly
- You may have to create an empty.htaccess file if your site does not already have one
- You may have to adjust the permissions for .htaccess so that bad-bot-script.php can write to it. If so, try:
touch .htaccess
chgrp www .htaccess
chmod 664 .htaccess - Your mail server may not like PHP generated mail
- Your mail server may need to be configured
- If you are locking out everyone- try adding the following two lines near the top of your main .htaccess file:
order allow,deny
though they should appear in the httpd.conf file on any public Apache Server and shouldn't be necessary here (I believe...)
allow from all - You have locked yourself out -this will happen every time you test the system, so be prepared to remove your ip from your .htaccess file.
- Your test adds your ip to the .htaccess file, but you still have access - your server may not be configured to allow use of .htaccess files.
What Happens:
A bad-bot grabs your robots.txt file and either ignores the file's directives, or uses that information to find stuff. If the bot follows the link to /badbots.php, the bad-bot-script.php fires, writing the visitor's ip to your .htaccess file and sending you an email to the fact. The bad-bot can no longer transverse your site.
Alternately a ripper visits your site and starts to download everything it can find. It will quickly stumble upon the /badbots.php link on your /index.*. Once visiting /badbots.php, it will be unable to download any more of your stuff, just like in the previous example.
Once the bot or ripper discovers it is locked out, it may thrash about a bit, trying to retrieve any largish file it may have an url for, but of course it will just be denied access, getting a 403 code and nothing else, and quickly move on.
Variations: endless
Sunday, September 03, 2006
Matthew Garrett Moves to Ubuntu
And what's with the Mandela connection?
I tried out Ubuntu 6.06.1 desktop on my old ibook (G4). The live CD worked fine, actually really well. But for some reason after booting back into Debian, I no longer need to use the "fn" key to access 'f1-12', wierd huh?
I then tried the Ubuntu CD on a mini G4, and Dapper didn't recognize the bluetooth keyboard or mouse, so no go.
Saturday, September 02, 2006
Java Site Ripper
Host: 64.105.113.204
Agent: Java/1.5.0_04
tried to rip through one of my sites last week and ran right into a bot-trap. Now the browser and ip are banned.
deny from 64.105.113.204
RewriteCond %{HTTP_USER_AGENT} ^Java
RewriteRule ^.* - [F,L]
pyscheclone as MSIE
Host: 208.66.195.3
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322).
Too bad you tried to grab the robots.txt: 403, then a video file, still 403!