Host: 74.53.249.34
Agent: Agent: Mozilla/5.0 (compatible; LiteFinder/1.0; +http://www.litefinder.net/about.html)
This is a horribly mis-configured spam bot. Best bet: forbid the damn thing from your site
Showing posts with label BadBots. Show all posts
Showing posts with label BadBots. Show all posts
Monday, October 22, 2007
Thursday, May 31, 2007
TMCrawler
Host: 128.241.20.206
Agent: TMCrawler
This bad-bot visited a site, grabbed the robots.txt twice and started to follow links at a very leisurely pace: just sub-directories over a couple of hours.
Then it started to explore a directory in a systematic fashion. It picked a sub-directory and began a numerical search: /directory/0, then /directory/1, and so on. I caught this bad-bot quickly because I have any 404 errors emailed to me immediately. I suspect it would have tripped a trap soon enough though.
Others have reported this bot as a WTF is this thing doing?
Based on its activites, I have simply decided to deny it access to my sites.
Agent: TMCrawler
This bad-bot visited a site, grabbed the robots.txt twice and started to follow links at a very leisurely pace: just sub-directories over a couple of hours.
Then it started to explore a directory in a systematic fashion. It picked a sub-directory and began a numerical search: /directory/0, then /directory/1, and so on. I caught this bad-bot quickly because I have any 404 errors emailed to me immediately. I suspect it would have tripped a trap soon enough though.
Others have reported this bot as a WTF is this thing doing?
Based on its activites, I have simply decided to deny it access to my sites.
Sunday, March 25, 2007
Bad-Bot Trap Revisited
There seems to be an increase of late in bad-bot activity with new ips, and new user-agents. So I thought I would add a couple of ideas to flesh out my original A Simple PHP based Bad-Bot Trap that seems to be rather popular.
I'm flattered that people are posting and collecting the links to this blog. But some scum are stealing the articles and posting them on their own sites: you who do so will receive instant and debilitating bad karma as a result. Furthermore, you do not have permission to do so. May you experience endless server and php errors.
Just link, its nicer. If you have comments or suggests, go for it.
I offer the following with the standard disclaimer: If you don't understand the code, don't use it!
We can notice that the bots tend to follow the links on a page in one of three fairly predictable ways: top down, alphabetically ascending, and alphabetically descending. If we wish to trap a bad-bot early on in its travels through our site, we can easily set traps for each possibility using the original bad-bot trap and a little .htaccess magic.
First add the following rules to the robots.txt under User-agent: *
add to .htacess
Now we have three different traps to embed in our pages:
which can go at the top of the page
and
which can go pretty much anywhere on a page.
The traps should be self-evident as to their use. The Disallow: /nofile.html exclusion was added for that particular species of bad-bot that uses the robots.txt to find links.
Happy Trapping!
I'm flattered that people are posting and collecting the links to this blog. But some scum are stealing the articles and posting them on their own sites: you who do so will receive instant and debilitating bad karma as a result. Furthermore, you do not have permission to do so. May you experience endless server and php errors.
Just link, its nicer. If you have comments or suggests, go for it.
I offer the following with the standard disclaimer: If you don't understand the code, don't use it!
We can notice that the bots tend to follow the links on a page in one of three fairly predictable ways: top down, alphabetically ascending, and alphabetically descending. If we wish to trap a bad-bot early on in its travels through our site, we can easily set traps for each possibility using the original bad-bot trap and a little .htaccess magic.
First add the following rules to the robots.txt under User-agent: *
Disallow: /afile.html
Disallow: /zfile.html
Disallow: /nofile.html
add to .htacess
# set 'RewriteEngine On' if you haven't already
# redirect badbots
RewriteRule ^afile.* /badbots.php [L]
RewriteRule ^zfile.* /badbots.php [L]
RewriteRule ^nofile.* /badbots.php [L]
Now we have three different traps to embed in our pages:
<p style="color:white;background:white;height:0;visibility:collapse;"
onclick="return false" >
<a href="/badbots.php" >.</a>
which can go at the top of the page
<p style="color:white;background:white;height:0;visibility:collapse;"
onclick="return false" >
<a href="/afile.html" >.</a>
and
<p style="color:white;background:white;height:0;visibility:collapse;"
onclick="return false" >
<a href="/zfile.html" >.</a>
which can go pretty much anywhere on a page.
The traps should be self-evident as to their use. The Disallow: /nofile.html exclusion was added for that particular species of bad-bot that uses the robots.txt to find links.
Happy Trapping!
Sunday, March 18, 2007
Trapped Bad-Bots
2007-10-22
Host: 74.53.249.34
Agent: Mozilla/5.0 (compatible; LiteFinder/1.0;
+http://www.litefinder.net/about.html)
2007-10-18
Host: 99.238.107.208
Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
2007-10-08
Host: 213.189.25.182
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET
2007-10-06
Host: 82.99.30.27
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
2007-10-05
Host: 82.99.30.32
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
2007-10-05
Host: 131.107.0.95
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; WOW64; SV1)
2007-10-03
Host: 67.19.250.26
Agent: Mozilla/5.0 (compatible; Gigamega.bot/1.0; +http://www.gigamega.net/bot.html)
2007-10-02
Host: 82.99.30.10
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
2007-09-10
Host:207.46.55.27/30
Agent: MSNPTC/1.0 (stupid ms bot can't parse robots.txt properly)
2007-07-13
Host:218.231.136.5
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP; DigExt)
2007-07-10
Host: 38.100.41.112
2007-07-06
Host: 209.85.94.164
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)
Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
+http://process4.com) Gecko/20070508 Firefox/1.5.0.12
stupid bot grabbed the robots.txt, then the first link listed in its exclusion list
2007-07-03
Host:74.208.71.84
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET
CLR 1.1.4322; .NET CLR 2.0.50727)
2007-06-27
Host: 63.251.174.4
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET
CLR 1.1.4322; .NET CLR 2.0.50727)
2007-06-26
Host: 24.87.89.186
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET
CLR 1.1.4322)
2007-06-22
Host: 64.92.199.41 and 64.92.199.41 (they have the whole block actually, I'm just banning the agent for a while)
Agent: libwww-perl/5.805
64.92.199.42
Agent: libwww-perl/5.805
2007-06-18
Host: 81.223.254.34
Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)
2007-06-15
Host: 201.5.229.201
Agent: Mozilla/3.0 (compatible; WebCapture 1.0; Auto; Windows)
2007-06-14
Host: 208.99.195.54
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)
2007-06-05
Host: 202.179.180.42
Agent: Mozilla/4.0 (compatible; NaverBot/1.0;
http://help.naver.com/delete_main.asp)
2007--05-26
Host: 24.242.34.213
Agent: MJ12bot/v1.2.0 (http://majestic12.co.uk/bot.php?+)
2007-05-25
Host: 84.88.32.199
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) Opera 7.23[ca]
2007-05-10
Host: 220.181.34.177
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; QihooBot
1.0 qihoobot@qihoo.net)
2007-04-26
Host: 65.222.176.124
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)
2007-04-24
Host: 212.219.190.178
Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows NT)
Host: 207.115.69.215
Agent: Mozilla/4.0/ (compatible- MSIE 6.0- Windows NT 5.1- SV1- .NET CLR 1.1.4322; ; )
Host: 65.222.176.125
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)
Host: 203.162.3.157
Agent: -
Host: 222.254.232.24
Agent: -
Host: 66.199.236.50
Agent: Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)
Host: 69.84.207.39
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50215)
Host: 208.223.208.181
Agent: Python-urllib/1.16
Host: 208.53.147.89
Mozilla/3.0 (compatible; NetPositive/2.2)
Host: 70.87.196.242
Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10)
Gecko/20050716 Firefox/1.0.6
Host: 65.222.176.122
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)
Host: 84.69.146.235
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)
Host: 84.70.209.45
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)
Host: 38.100.41.105
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)
Host: 38.100.41.102
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)
The related block of ips hosting bad-bots that I've seen so far are in the range 38.100.41.100 - 38.100.41.107
Host: 88.198.7.39
Agent: findfiles.org/0.9 (Robot;robot@findfiles.org)
Host: 72.21.50.202
Agent: Mozilla/4.0 (compatible; MSIE 5.01; MSNIA; Windows 98)
Host: 65.222.176.123
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)
Host: 88.151.114.39
Agent: Mozilla/5.0 (compatible; Webbot/0.1; http://www.webbot.ru/bot.html)
Host: 74.53.249.34
Agent: Mozilla/5.0 (compatible; LiteFinder/1.0;
+http://www.litefinder.net/about.html)
2007-10-18
Host: 99.238.107.208
Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
2007-10-08
Host: 213.189.25.182
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET
2007-10-06
Host: 82.99.30.27
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
2007-10-05
Host: 82.99.30.32
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
2007-10-05
Host: 131.107.0.95
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; WOW64; SV1)
2007-10-03
Host: 67.19.250.26
Agent: Mozilla/5.0 (compatible; Gigamega.bot/1.0; +http://www.gigamega.net/bot.html)
2007-10-02
Host: 82.99.30.10
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
2007-09-10
Host:207.46.55.27/30
Agent: MSNPTC/1.0 (stupid ms bot can't parse robots.txt properly)
2007-07-13
Host:218.231.136.5
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP; DigExt)
2007-07-10
Host: 38.100.41.112
2007-07-06
Host: 209.85.94.164
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)
Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
+http://process4.com) Gecko/20070508 Firefox/1.5.0.12
stupid bot grabbed the robots.txt, then the first link listed in its exclusion list
2007-07-03
Host:74.208.71.84
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET
CLR 1.1.4322; .NET CLR 2.0.50727)
2007-06-27
Host: 63.251.174.4
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET
CLR 1.1.4322; .NET CLR 2.0.50727)
2007-06-26
Host: 24.87.89.186
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET
CLR 1.1.4322)
2007-06-22
Host: 64.92.199.41 and 64.92.199.41 (they have the whole block actually, I'm just banning the agent for a while)
Agent: libwww-perl/5.805
64.92.199.42
Agent: libwww-perl/5.805
2007-06-18
Host: 81.223.254.34
Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)
2007-06-15
Host: 201.5.229.201
Agent: Mozilla/3.0 (compatible; WebCapture 1.0; Auto; Windows)
2007-06-14
Host: 208.99.195.54
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)
2007-06-05
Host: 202.179.180.42
Agent: Mozilla/4.0 (compatible; NaverBot/1.0;
http://help.naver.com/delete_main.asp)
2007--05-26
Host: 24.242.34.213
Agent: MJ12bot/v1.2.0 (http://majestic12.co.uk/bot.php?+)
2007-05-25
Host: 84.88.32.199
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) Opera 7.23[ca]
2007-05-10
Host: 220.181.34.177
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; QihooBot
1.0 qihoobot@qihoo.net)
2007-04-26
Host: 65.222.176.124
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)
2007-04-24
Host: 212.219.190.178
Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows NT)
Host: 207.115.69.215
Agent: Mozilla/4.0/ (compatible- MSIE 6.0- Windows NT 5.1- SV1- .NET CLR 1.1.4322; ; )
Host: 65.222.176.125
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)
Host: 203.162.3.157
Agent: -
Host: 222.254.232.24
Agent: -
Host: 66.199.236.50
Agent: Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)
Host: 69.84.207.39
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50215)
Host: 208.223.208.181
Agent: Python-urllib/1.16
Host: 208.53.147.89
Mozilla/3.0 (compatible; NetPositive/2.2)
Host: 70.87.196.242
Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10)
Gecko/20050716 Firefox/1.0.6
Host: 65.222.176.122
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)
Host: 84.69.146.235
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)
Host: 84.70.209.45
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)
Host: 38.100.41.105
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)
Host: 38.100.41.102
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)
The related block of ips hosting bad-bots that I've seen so far are in the range 38.100.41.100 - 38.100.41.107
Host: 88.198.7.39
Agent: findfiles.org/0.9 (Robot;robot@findfiles.org)
Host: 72.21.50.202
Agent: Mozilla/4.0 (compatible; MSIE 5.01; MSNIA; Windows 98)
Host: 65.222.176.123
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)
Host: 88.151.114.39
Agent: Mozilla/5.0 (compatible; Webbot/0.1; http://www.webbot.ru/bot.html)
Subscribe to:
Posts (Atom)