Showing posts with label BadBots. Show all posts
Showing posts with label BadBots. Show all posts

Monday, October 22, 2007

LiteFinder/1.0

Host: 74.53.249.34
Agent: Agent: Mozilla/5.0 (compatible; LiteFinder/1.0; +http://www.litefinder.net/about.html)

This is a horribly mis-configured spam bot. Best bet: forbid the damn thing from your site

Thursday, May 31, 2007

TMCrawler

Host: 128.241.20.206
Agent: TMCrawler

This bad-bot visited a site, grabbed the robots.txt twice and started to follow links at a very leisurely pace: just sub-directories over a couple of hours.

Then it started to explore a directory in a systematic fashion. It picked a sub-directory and began a numerical search: /directory/0, then /directory/1, and so on. I caught this bad-bot quickly because I have any 404 errors emailed to me immediately. I suspect it would have tripped a trap soon enough though.

Others have reported this bot as a WTF is this thing doing?

Based on its activites, I have simply decided to deny it access to my sites.

Sunday, March 25, 2007

Bad-Bot Trap Revisited

There seems to be an increase of late in bad-bot activity with new ips, and new user-agents. So I thought I would add a couple of ideas to flesh out my original A Simple PHP based Bad-Bot Trap that seems to be rather popular.

I'm flattered that people are posting and collecting the links to this blog. But some scum are stealing the articles and posting them on their own sites: you who do so will receive instant and debilitating bad karma as a result. Furthermore, you do not have permission to do so. May you experience endless server and php errors.

Just link, its nicer. If you have comments or suggests, go for it.

I offer the following with the standard disclaimer: If you don't understand the code, don't use it!

We can notice that the bots tend to follow the links on a page in one of three fairly predictable ways: top down, alphabetically ascending, and alphabetically descending. If we wish to trap a bad-bot early on in its travels through our site, we can easily set traps for each possibility using the original bad-bot trap and a little .htaccess magic.

First add the following rules to the robots.txt under User-agent: *

Disallow: /afile.html
Disallow: /zfile.html
Disallow: /nofile.html

add to .htacess

# set 'RewriteEngine On' if you haven't already
# redirect badbots
RewriteRule ^afile.* /badbots.php [L]
RewriteRule ^zfile.* /badbots.php [L]
RewriteRule ^nofile.* /badbots.php [L]


Now we have three different traps to embed in our pages:

<p style="color:white;background:white;height:0;visibility:collapse;"
onclick="return false" >
<a href="/badbots.php" >.</a>

which can go at the top of the page

<p style="color:white;background:white;height:0;visibility:collapse;"
onclick="return false" >
<a href="/afile.html" >.</a>

and

<p style="color:white;background:white;height:0;visibility:collapse;"
onclick="return false" >
<a href="/zfile.html" >.</a>

which can go pretty much anywhere on a page.

The traps should be self-evident as to their use. The Disallow: /nofile.html exclusion was added for that particular species of bad-bot that uses the robots.txt to find links.

Happy Trapping!

Sunday, March 18, 2007

Trapped Bad-Bots

2007-10-22
Host: 74.53.249.34
Agent: Mozilla/5.0 (compatible; LiteFinder/1.0;
+http://www.litefinder.net/about.html)

2007-10-18
Host: 99.238.107.208
Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)

2007-10-08
Host: 213.189.25.182
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET

2007-10-06
Host: 82.99.30.27
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

2007-10-05
Host: 82.99.30.32
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

2007-10-05
Host: 131.107.0.95
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; WOW64; SV1)

2007-10-03
Host: 67.19.250.26
Agent: Mozilla/5.0 (compatible; Gigamega.bot/1.0; +http://www.gigamega.net/bot.html)

2007-10-02
Host: 82.99.30.10
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

2007-09-10
Host:207.46.55.27/30
Agent: MSNPTC/1.0 (stupid ms bot can't parse robots.txt properly)

2007-07-13
Host:218.231.136.5
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP; DigExt)

2007-07-10
Host: 38.100.41.112
2007-07-06
Host: 209.85.94.164
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)

Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
+http://process4.com) Gecko/20070508 Firefox/1.5.0.12
stupid bot grabbed the robots.txt, then the first link listed in its exclusion list


2007-07-03
Host:74.208.71.84
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET
CLR 1.1.4322; .NET CLR 2.0.50727)

2007-06-27
Host: 63.251.174.4
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET
CLR 1.1.4322; .NET CLR 2.0.50727)


2007-06-26
Host: 24.87.89.186
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET
CLR 1.1.4322)

2007-06-22
Host: 64.92.199.41 and 64.92.199.41 (they have the whole block actually, I'm just banning the agent for a while)
Agent: libwww-perl/5.805

64.92.199.42
Agent: libwww-perl/5.805

2007-06-18
Host: 81.223.254.34
Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)

2007-06-15
Host: 201.5.229.201
Agent: Mozilla/3.0 (compatible; WebCapture 1.0; Auto; Windows)

2007-06-14
Host: 208.99.195.54
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)

2007-06-05
Host: 202.179.180.42
Agent: Mozilla/4.0 (compatible; NaverBot/1.0;
http://help.naver.com/delete_main.asp)

2007--05-26
Host: 24.242.34.213
Agent: MJ12bot/v1.2.0 (http://majestic12.co.uk/bot.php?+)

2007-05-25
Host: 84.88.32.199
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) Opera 7.23[ca]

2007-05-10
Host: 220.181.34.177
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; QihooBot
1.0 qihoobot@qihoo.net)

2007-04-26
Host: 65.222.176.124
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)

2007-04-24
Host: 212.219.190.178
Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows NT)

Host: 207.115.69.215
Agent: Mozilla/4.0/ (compatible- MSIE 6.0- Windows NT 5.1- SV1- .NET CLR 1.1.4322; ; )

Host: 65.222.176.125
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)

Host: 203.162.3.157
Agent: -

Host: 222.254.232.24
Agent: -

Host: 66.199.236.50
Agent: Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)

Host: 69.84.207.39
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50215)

Host: 208.223.208.181
Agent: Python-urllib/1.16

Host: 208.53.147.89
Mozilla/3.0 (compatible; NetPositive/2.2)

Host: 70.87.196.242
Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10)
Gecko/20050716 Firefox/1.0.6


Host: 65.222.176.122
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)


Host: 84.69.146.235
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)

Host: 84.70.209.45
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)


Host: 38.100.41.105
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)

Host: 38.100.41.102
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)

The related block of ips hosting bad-bots that I've seen so far are in the range 38.100.41.100 - 38.100.41.107

Host: 88.198.7.39
Agent: findfiles.org/0.9 (Robot;robot@findfiles.org)

Host: 72.21.50.202
Agent: Mozilla/4.0 (compatible; MSIE 5.01; MSNIA; Windows 98)

Host: 65.222.176.123
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)

Host: 88.151.114.39
Agent: Mozilla/5.0 (compatible; Webbot/0.1; http://www.webbot.ru/bot.html)