Sunday, March 25, 2007

Bad-Bot Trap Revisited

There seems to be an increase of late in bad-bot activity with new ips, and new user-agents. So I thought I would add a couple of ideas to flesh out my original A Simple PHP based Bad-Bot Trap that seems to be rather popular.

I'm flattered that people are posting and collecting the links to this blog. But some scum are stealing the articles and posting them on their own sites: you who do so will receive instant and debilitating bad karma as a result. Furthermore, you do not have permission to do so. May you experience endless server and php errors.

Just link, its nicer. If you have comments or suggests, go for it.

I offer the following with the standard disclaimer: If you don't understand the code, don't use it!

We can notice that the bots tend to follow the links on a page in one of three fairly predictable ways: top down, alphabetically ascending, and alphabetically descending. If we wish to trap a bad-bot early on in its travels through our site, we can easily set traps for each possibility using the original bad-bot trap and a little .htaccess magic.

First add the following rules to the robots.txt under User-agent: *

Disallow: /afile.html
Disallow: /zfile.html
Disallow: /nofile.html

add to .htacess

# set 'RewriteEngine On' if you haven't already
# redirect badbots
RewriteRule ^afile.* /badbots.php [L]
RewriteRule ^zfile.* /badbots.php [L]
RewriteRule ^nofile.* /badbots.php [L]


Now we have three different traps to embed in our pages:

<p style="color:white;background:white;height:0;visibility:collapse;"
onclick="return false" >
<a href="/badbots.php" >.</a>

which can go at the top of the page

<p style="color:white;background:white;height:0;visibility:collapse;"
onclick="return false" >
<a href="/afile.html" >.</a>

and

<p style="color:white;background:white;height:0;visibility:collapse;"
onclick="return false" >
<a href="/zfile.html" >.</a>

which can go pretty much anywhere on a page.

The traps should be self-evident as to their use. The Disallow: /nofile.html exclusion was added for that particular species of bad-bot that uses the robots.txt to find links.

Happy Trapping!

Tuesday, March 20, 2007

Internet Explorer 7.0 (MSIE 7.0)

So, after spending some time with Internet Explorer 7.0 ( MSIE 7.0), I can't decide if its a move sideways or backwards. From a user's perspective, it certainly is prettier than MSIE 6. Unfortunately it's 'security' features get annoying really fast. Try it and you'll see what I mean.

From a web developer's perspective, its a pain in the butt. Though some of the problems of MSIE 6 have been addressed in 7, a whole new set of problems need to be dealt with: web pages that have hacks to get 6 to behave, have to be re-hacked now to get 7 to behave.

Hey Microsoft: can't you guys figure out how to handle 'float' properly? And javascript, sorry JScript, don't get me started. What the hell's the problem? Mozilla figured this stuff out long ago. Were you not at the table when the standards were developed? My god CSS 2 is almost 10 year old! Its as if you purposely undermine standards that you were involved in establishing, by releasing broken software like MSIE 7, in order to undermine your competition. Truly evil, Microsoft.

Do some bug comparisons between 6 and 7 if you like. A good place to start is http://www.gtalbot.org/BrowserBugsSection/

So now we, as developers, have to maintain 3 sets of pages: one for browsers that actually work the way they are supposed to ( or at least try to address their bugs and short-comings in an open and timely manner), one for MSIE 6, and one for MSIE 7.

Egads, I feel dirty every time I have to use something created by Microsoft.

Sunday, March 18, 2007

Trapped Bad-Bots

2007-10-22
Host: 74.53.249.34
Agent: Mozilla/5.0 (compatible; LiteFinder/1.0;
+http://www.litefinder.net/about.html)

2007-10-18
Host: 99.238.107.208
Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)

2007-10-08
Host: 213.189.25.182
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET

2007-10-06
Host: 82.99.30.27
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

2007-10-05
Host: 82.99.30.32
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

2007-10-05
Host: 131.107.0.95
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; WOW64; SV1)

2007-10-03
Host: 67.19.250.26
Agent: Mozilla/5.0 (compatible; Gigamega.bot/1.0; +http://www.gigamega.net/bot.html)

2007-10-02
Host: 82.99.30.10
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

2007-09-10
Host:207.46.55.27/30
Agent: MSNPTC/1.0 (stupid ms bot can't parse robots.txt properly)

2007-07-13
Host:218.231.136.5
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP; DigExt)

2007-07-10
Host: 38.100.41.112
2007-07-06
Host: 209.85.94.164
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)

Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
+http://process4.com) Gecko/20070508 Firefox/1.5.0.12
stupid bot grabbed the robots.txt, then the first link listed in its exclusion list


2007-07-03
Host:74.208.71.84
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET
CLR 1.1.4322; .NET CLR 2.0.50727)

2007-06-27
Host: 63.251.174.4
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET
CLR 1.1.4322; .NET CLR 2.0.50727)


2007-06-26
Host: 24.87.89.186
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET
CLR 1.1.4322)

2007-06-22
Host: 64.92.199.41 and 64.92.199.41 (they have the whole block actually, I'm just banning the agent for a while)
Agent: libwww-perl/5.805

64.92.199.42
Agent: libwww-perl/5.805

2007-06-18
Host: 81.223.254.34
Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)

2007-06-15
Host: 201.5.229.201
Agent: Mozilla/3.0 (compatible; WebCapture 1.0; Auto; Windows)

2007-06-14
Host: 208.99.195.54
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)

2007-06-05
Host: 202.179.180.42
Agent: Mozilla/4.0 (compatible; NaverBot/1.0;
http://help.naver.com/delete_main.asp)

2007--05-26
Host: 24.242.34.213
Agent: MJ12bot/v1.2.0 (http://majestic12.co.uk/bot.php?+)

2007-05-25
Host: 84.88.32.199
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) Opera 7.23[ca]

2007-05-10
Host: 220.181.34.177
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; QihooBot
1.0 qihoobot@qihoo.net)

2007-04-26
Host: 65.222.176.124
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)

2007-04-24
Host: 212.219.190.178
Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows NT)

Host: 207.115.69.215
Agent: Mozilla/4.0/ (compatible- MSIE 6.0- Windows NT 5.1- SV1- .NET CLR 1.1.4322; ; )

Host: 65.222.176.125
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)

Host: 203.162.3.157
Agent: -

Host: 222.254.232.24
Agent: -

Host: 66.199.236.50
Agent: Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)

Host: 69.84.207.39
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50215)

Host: 208.223.208.181
Agent: Python-urllib/1.16

Host: 208.53.147.89
Mozilla/3.0 (compatible; NetPositive/2.2)

Host: 70.87.196.242
Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10)
Gecko/20050716 Firefox/1.0.6


Host: 65.222.176.122
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)


Host: 84.69.146.235
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)

Host: 84.70.209.45
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)


Host: 38.100.41.105
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)

Host: 38.100.41.102
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)

The related block of ips hosting bad-bots that I've seen so far are in the range 38.100.41.100 - 38.100.41.107

Host: 88.198.7.39
Agent: findfiles.org/0.9 (Robot;robot@findfiles.org)

Host: 72.21.50.202
Agent: Mozilla/4.0 (compatible; MSIE 5.01; MSNIA; Windows 98)

Host: 65.222.176.123
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)

Host: 88.151.114.39
Agent: Mozilla/5.0 (compatible; Webbot/0.1; http://www.webbot.ru/bot.html)