Friday, September 08, 2006

Another Bad-Bot Falls into Trap

Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)
Host: 63.100.163.70
This bot, disguised as MSIE tried to rip through one of my sites, and ran right into a bot trap. It started off looking like a regular browser: it loaded the site's root index and the page's .css. It didn't load the robots.txt.

Nevertheless, several things in combination gave it away as a disrespectful bot or ripper:
  1. It wasn't loading the page's associated binaries, i.e. images and so on
  2. It wasn't loading javascript (everyone knows that MSIE can't do much without it)
  3. It was crawling pages at three pages per second
  4. On closer inspection, it only loaded one of the two .css files on the index page
  5. It tried to follow links that were commented out in the page's mark-up
  6. It ran into a bot trap that a normal user wouldn't see.

Whois says 63.100.163.70 belongs to:

UUNET Technologies, Inc.
22001 Loudoun County Parkway
Ashburn, VA, 20147, US

No comments: