Search engine robots that visit your web site by John A Fotheringham.

Published on Sep. 5 2007
Register with us in one easy step!

List of robots and where they come from, thanks to research by John Fotheringham!

Search engine robots that visit your web site by John A Fotheringham.

Home page/search engine Robot identifier IP address(es) AbachoBOT abcdatos_botlink AESOP_com_SpiderMan crawler (crawler(at) ia_archiver Scooter Mercator Scooter2_Mercator_3-1.0 Tv#nn#_Merc_resh_26_1_D-1.0 AltaVista-Intranet jan.gelin(at) FAST-WebCrawler crawler(at) Wget Acoon Robot antibot Atomz AxmoRobot Buscaplus Robi CanSeek/ support(at) ChristCRAWLER Clushbot Crawler admin(at) ROBOT/ RaBot Agent-admin/ phortse(at) contact/jylee(at) RaBot Agent-admin/ webmaster(at) DeepIndex DittoSpyder Jack Speedy Spider ArchitextSpider Musical instrumentss are used in the name such as (and the rest of the band) more recently first names are being used like peter.excite.con

(excite) ArchitectSpider EuripBot Arachnoidea arachnoidea(at) EZResult Fast PartnerSite Crawler FAST Data Search Crawler FAST Data Search Document Retriever KIT-Fireball ? FyberSearch GalaxyBot geckobot GenCrawler ?

(Genealogical Search Engine) GeonaBot getRAX Googlebot googlebot(at) moget/2.0 moget(at) Aranha

(inktomi) slurp(at)

(inktomi) Slurp/2.0j slurp(at)

(inktomi) Slurp/2.0-KiteHourly slurp(at);

(inktomi) Slurp/2.0-OwlWeekly spider(at)

(inktomi) Slurp/3.0-AU slurp(at) Toutatis 2.5-2

(need V5 browsers to view) Hubater

(research centre) IlTrovatore-Setaccio IncyWincy UltraSeek InfoSeek Sidewinder Mole2/1.0 webmaster(at) MP3Bot #..# spider(at) kuloko-bot/0.2 LNSpiderguy Linknzbot lookbot MantraAgent NetResearchServer

(see also Lycos_Spider_(T-Rex) JoocerBot HenryTheMiragoRobot MojeekBot mozDex/ (within MSNBOT/0.1 Navadoo Crawler Gulliver ObjectsSearch/0.01 OnetSzukaj/ PicoSearch/ PJspider DIIbot

but it won't let us in nttdirectory_robot super-robot(at) griffon griffon(at) Spider/ admin(at)

various (fakes agent on each access)

gazz/1.0 gazz(at) NationalDirectory-SuperSpider dloader(NaverRobot)/ dumrobo(NaverRobot)/ noxtrumbot/ "Openfind piranha Shark"

(Chinese language) robot-response(at) Openbot/ psbot CrawlerBoy QweeryBot AlkalineBOT StackRambler/ SeznamBot Search-10 Fluffy the spider info(at) Scrubby/ asterias speedfind ramBot xtreme Kototoi/0.1 SearchByUsa Searchspider/ SightQuestBot/ Spider_Monkey/ Surfnomore Spider v1.1 Robot(at)SuperSnooper.Com teoma_agent1 teoma_admin(at) Teradex_Mapper mapper(at) ESISmartSpider Spider TraficDublu "81.196.*.*" Tutorial Crawler updated/0.1beta crawler(at) UK Searcher Spider - Vivante Link Checker

(coming soon) appie "uses an address at a Dutch ISP" Nazilla - marvin/infoseek marvin-team(at) MuscatFerret WhizBang! Lab ZyBorg - (info(at) WIRE WebRefiner: webrefiner(at) WSCbot Yandex Yellopet-Spider

pet-based search engine Findexa Crawler YBSbot search engine indexer

#client sites# libwww-perl Iron33


Most browsers identify themselves with a string that begins 'Mozilla...'. I've chosen not to document those (as yet). Here are a few of the rarer browser identifiers that I've seen.

Browser identifier Information

AmigaVoyager Voyager browser for the Amiga

xChaos_Arachne (DOS-compatible browser. Linux version under development)

IBrowse (search for IBrowse) Amiga-based browser

ICab (Macintosh-only)

JustView (I think this is a browser. Site is in Japanese)

KMeleon (Light browser based on the Mozilla code base)

Konqueror (Linux KDE browser)

Lynx (Cross-platform text based browser)

OmniWeb (Macintosh-only)

Opera "(Cross-platform small efficient and standards lead browser)"

Plucker (Palm handhelds. Written in Python)

pwWebSpeak Audio Browser

QWeb (Linux browser) (see also

retawq Text-based browser for text terminals. Runs under Linux

SlimBrowser Freeware tabbed browser

Sleipnir (Japanese) Japanese browser with apparantly an English version available.

VMS_Mosaic "(OpenVMS only version of Mosaic a pre-Netscape browser)"

WannaBe (Macintosh text-only browser)

w3m (text-based browser)

"Link Checkers Link monitors and bookmark managers "

"Link checkers and bookmark managers are run by people wanting to keep their pages and bookmarks up to date. Being visited by a link checker is good news as it means that someone has linked to you and cares that you're still alive. Link monitors regularly check your pages for changes usually because someone has selected your page as 'one to watch'. "

(pause for warm glow

"If you have access to the server log check the referrer page to try and get the URL from which you are linked. Sometimes these URLs are inside password protected parts of sites so you won't be able to view the page. "

"If you build up a list of sites that link to you these are the guys you should tell when you move (moral never move) "

"It's also quite common for the Link checker to give no indication of which URL it's coming from. Some link checkers always come from the same IP address more usually they come from the client's site. It depends on whether the site owner has purchased a copy of the link checking software or signed up to some centralized link checking service. If you get the client's IP address you can always try visiting that if they blank the referrer URL field and surfing their site. "

Some of these tools appear to imply they're extracting email addresses (e.g. emailSiphon). As such they're probably unwelcome visitors since these addresses are probably being collected for spammers.

A page listing various link checkers (and other tools) can be found at

Robot identifier IP address(es) Link Checker home page

ActiveBookmark #client site#

ALink #client site# "Reciprocal Link Checker Manager and Page Generator."

AMeta #client site# Meta Tag Generator

ASPSearch URL Checker #client site# a site search engine/index maintenance tool

BlogBot #client site#

BMChecker #client site# (Japanese Bookmark Checker)

Bookmark Buddy #client site#

Check&Get #client site#

CheckWeb #client site#

CNET_Snoop (only if you have software listed at that site)

CSE HTML Validator #client site# HTML page validator that includes a link checker amongst it's functions.

DRKSpider #client site# (An Open Source project)

DISCo Watchman #client site#


Email Extractor #client site# #email collector# We don't list links to email collectors on this site

EmailSiphon #client site# #email collector# We don't list links to email collectors on this site

EmailWolf #client site#

FavOrg #client site# " 1759 1558477 00.asp" A utility written by PC Magazine to fetch icons files (favicon.ico) for your IE favorites

Favorites Sweeper #client site# Another 'favorites' tidy-up utility

FreshLinks.exe #client site#

Funnel Web Profiler #client site# "Profiles your site including links to/from it"

Html Link Validator #client site#

HTMLParser #client site# an open source "HTML parser that is probably exercising it's" link-checking features.

The Informant

The Intraformant

InternetLinkAgent #client site# (in Japanese)

InternetPeriscope #client site#


jdwhatsnew.cgi #client site#

JRTS Check Favorites Utility #client site#

Lambda LinkCheck

LinkLint-checkonly --


Linkbot #client site#

Linkman (Mozilla...)

LinkProver #client site#

Links -- (Link management cgi script)

LinkScan Server #client site#

LinkSweeper #client site#

Link Valet Online

LinkVerify Spider


Morning Paper #client site#

MoveAnnouncer -- (notifies webmasters when your pages have moved)

mylinkcheck -- (German)

NetLookout --


NetMind-Minder (retired)

NetMonitor --

Netprospector JavaCrawler #client site#

online link validator (online link checker submit your URL)

Rational SiteCheck #client site#

Robozilla (checks links in the dmoz directory)

RPT-HTTPClient #client site# Java utility that uses the Java HTTPClient class library

SiteBar #client site#

SpurlBot Online bookmark agent

SurfMaster #client site#

SyncIT #client site#

Watchfire WebXM #client site#

WatzNew Agent #client site#

WebSite-Watcher #client site#

WebTrends Link Analyzer #client site#

Weblink Scanner #client site#

Xenu's Link Sleuth #client site#

Z-Add Link Checker #client site?#


"Validators check your web pages for HTML correctness and standards compliance. Since other people are unlikely to send a validator to your site you don't usually see much of this. Consequently the 'list' below is restricted to the on-line validators I've used myself. "

"However if you choose to validate your own site then the validation attempts will appear in your logs. The following list is thus limited to the on-line validator I use (and recommend) and a URL submission service that I use. "

Robot Identifier IP address Validator home page



Tooter This is used as part of a link submission agent (trebor(at)

FTP clients and download managers

"If you offer files for download then you'll start to be visited by various FTP clients. Clients like Go!Zilla and GetRight are smart in that they can resume downloads that have been interrupted. This relies on your web server supporting the necessary protocol but that's fairly standard these days. "

"If your download files are over 1Mb in size (or if your server is slow) you'll often see the same IP address make multiple partial downloads of your file (look at the file size). In the case of Clients line Go!Zilla and GetRight if these add up to the right number of bytes then chances are the download succeeded. "

Client Identifier FTP Client home page



ChinaClaw (Chinese) (Chinese download utility)


DLExpert (English and Chinese versions available)

Download Demon

Download Master (Russian)

Download Ninja (Japanese)

Download Wonder

Ez Auto Downloader "Downloads all files of a given type from a site so it's" more like a site grabber







JetCar (or FlashGet)


Kontiki Client




Mass Downloader

MetaProducts Download Express

NetZip Downloader





Net Vampire

Nitro Downloader




SpeedDownload (for Macintosh)

WebDownloader for X 1.30 (Linux web downloader with X GUI)

WebLeacher (down last time I tried it) more details at

WebPictures Downloader Locates and downloads pictures

X-Uploader "Can't find the home page but it's described (in Russian)" on

Research projects

These agents come from research projects. Of course that's how Google started...

citenikbot/ One-man project due for release in 2004.

CLIPS-index (French) French research robot from a linguistics project (?)

Computer_and_Automation_Research_Institute_Crawler Robot from the research centre at Hungarian Acedemy of Sciences at Crawls from IP

cosmos Spider from which is a project to locate

robot(at) and index XML content on the web. The company is a spin off "from project at INRIA in France a frequent source of" web robots. The word 'xyleme' apparantly relates to the "vascular system in plants but cleverly must be one of" "the very few words to contain the letters 'X' 'M' and 'L'" (although not in that order

D2KWebCrawler Data to Knowledge' data miner. Crawls from

DiaGem/ Experimental spider from Mitsibushi R&D division Crawls from IP

Digimarc WebReader Digimarc search images on the web looking for digital watermatrs More details at

EchO!/2.0 "Spiders from which would seem to be part" "of a French-based search engine."

FinaleRobot The site describes an Interactive Natural

robot-master(at) Language encyclopedia that will become a search engine "at Good name but at present it just" maps back onto the ExpressUs site (not such a good name). Crawls from IP address

Ideare SignSite Spiders from Ideare are "a research company producing search engine technology and are" "part owned by Tiscali in Italy who seem to use their various" "tools for different search engines (mp3 images etc)."

GentleSpider Some sort of spider that usually visits using an IP address from within or

Gulper Web Bot (Open research project to produce opinion-based search engine)

larbin "And from the people that brought you xyro (see below) "

sebastien.ailleret(at) "comes another newer bot. This one seems to crawl from"

ghi(at) the IP address Update more recently it's also been seen coming from

cosmos "And then there was 'cosmos' crawling from" Seems these people are a webbot factory. Cosmos doesn't offer an email address.

IRLbot Crawls from crawls randomly to determine the topology of the web.

KnowItAll a project that extracts massive amounts of information from the Web in "an autonomous scalable manner'. Don't they know that" everyone hates a know-it-all?

MJ12bot A dsitributed search engine project

MultiText Research project to index the last weeks' news items

NEC Research Agent Research 'Inquirus' (meta?) search engine

OntoSpider Dutch robot for a research project. Crawls from

sherlock_spider A course project from Crawls from

S.T.A.L.K.E.R. 'My first robot' Crawls from

Steeler Japanese research robot.

ru-robot "Unable to find details on this but I'm guessing it's"

0.1_hseo(at) a research spider from Crawls using the IP

USyd-NLP-Spider research into Natural "Language Processing at University of Sydney Australia"

WebGather Chinese search project

xyro Seems to be a spider associated with a French

xcrawler(at) research institute. Usually crawls using the IP address

Zao/0.2 Another Japanese research robot Crawls from

Zao-Crawler "Same as above but crawled from"

Software packages

These agents are the default identifiers for various software packages. Software developers uses these packages to add Internet functionality to their own applications. As such it's impossible to say without looking at the pattern of access what these agents are being used for as the same agent name may be used by different developers fo achieve differemt results.

"While many of these packages allow you to change the user agent some do not and many developers are too lazy to change the agent string. "

GT::WWW Apparantly some form of web-accessing perl module. Possible included in the Links SQL product produced by

HTTPClient Default agent name used by the Java HTTPClient class. (See also RPT-HTTPClient below)

HTTP::Lite Default identifier for a set of light-weight perl modules for retrieving web documents . See

IP*Works! Set of TCP/IP components used in cross-platform development of internet tools

libwww-perl The PERL programming language comes with a number of routines for constructing web-aware scripts. This and "related strings are the default user agent identifiers " although it's perfectly easy to change this to be whatever you want.

libghttp The GNOME http library. A Linux software library the offers connectivity to the web. Found in many places on the web. There is a description at

Macromedia Flash Player Flash movies can contain scripts that can fetch content from the web (such as other Flash movies or images)

MFC_Tear_Sample Agent name used in the sample code supplied with Visual C++ for accessing the web. This may be therefore be someone running a program they've written based on that code.

PEAR HTTP_Request class TPEAR is a framework and distribution system for reusable PHP components

Python-urllib Presumably the default identifier for the urllib module in the Python programming language

RPT-HTTPClient The Java HTTPClient class library

TeamSoft WinInet Component (menus require Java) Internet software component suite

wget Free Unix/Linux package for retrieving web pages

WinScripter iNet Tools COM/DLL object that supports the SMTP and HTTP protocols

W3CRobot/ A fast web-spidering robot included with the libwww package (?). See

W3C-WebCon/ a command-line toolkit that allows you to perform HTTP operations

wxWidgets cross-platform open source C++ GUI builder "which includes 'HTML viewing' and much much more."

Zeus #nnnn# Webster Pro

Offline browsers and other agents

Agent Identifier Agent home page




EirGrabber (Japanese software from the 'Eir Project')

ExtractorPro (Bulk email marketing tool. URL deliberately omitted)

FairAd Client (German) A German pay-to-surf client

JoBo a site downloader

iSiloWeb (for palm pilot)

Kenjin Spider

MSIECrawler (Microsoft IE4.0)


NexTools WebAgent

Offline Explorer

NetAttache Offline browser and search engine agent

PageDown Details (in Japanese) at


Searchworks Spider


SiteSnagger " 1759 1559896 00.asp"


Teleport Pro


Web2Map Web site copier. English/German versions available

WebAuto I think this is an offline browser. Site is in Japanese


Webdup (Chinese software. Not 100% sure what it does)




Website eXtractor



WebTwin Convert websites into help files.




Xaldon WebSpider (German) Offline browser

Other miscellaneous agents

"These agents are ones that we've seen but been unable to get information for or which are slightly unusual in origin. If you have any additional information on any of these feel free to send it to info(at) "

User Agent Information

Ad Muncher "Browser plug-in that monitors the pages as you view them " "and removes all adverts popup windows etc."


ADSARobot distributed search engine project Contact postmaster(at) browses from (which doesn't make sense for a distributed search engine

Albert Indexer Multi-lingual search technology

AnswerChase a personal search robot.

ASPSeek An open source search engine project

ATA-Translation-Service "Looks to be an online translation tool much like" Babelfish. Possibly related to

AVSearch Seems to be the AltaVista personal search agent. The crawling site is sometimes referred to in the agent name

Avant Browser Browser add-on for Internet Explorer

Beamer (French). A browser accelerator that requires sites to create a 'pagebeamer.txt' file that is fetched by this agent to do predictive downloads.

beholder or


BravoBrian (may require IE). A content filtering service that offers protection from pornography and other unwanted content for children. Comes from IP

bumblebee(at) Software used to build 'Vortals' (vertical portals). Details (requires Flash) can be found at

Checkbot Seems to come from who offer B2B services

contype Possibly Adobe Acrobat or Reader or Adobe Acrobat Reader used with MSIE (I have been unable to confirm this)

Convera Internet Spider A 'RetrievalWare' product which claims to be a multimedia web cralwer.

ConveraCrawler Probably related to the above

ccubee Crawler technology from

Custo Tool to map the structure of a web site

CyberNavi_WebGet "UA points to but there's not" much there. It crawls from which is (Japanese). Bablefish suggests this is a Japanese company offering search products


deepweb Also calls itself an 'Intelligent Deep-Web Robotic Agent' A search engine indexer that will index dynamic content. Indexs from IP

EbiNess An Open Source project to display Internet information ina 3D format.

EmailWolf email program no longer available that's the only reason I'm prepared to list it on this page.

Excalibur Internet Spider

Expired Domain Sleuth "Hunts down popular yet expired domain names with" a view to letting you purchase an already popular domain name.

Everest-Vulcan Inc./ Next-generation services rechnology (under development)




Giskard (Trivia note: Giskard is probably named after the Isaac Asimov robot)

grub-client "Grub is a distributed open source web crawler. Users" download the client which then indexes the web as part of a distibuted effort

heritrix "Open-source extensible web crawler project"

htdig search engine software for companies and universities A browser accelerator. The idea is that you browser 'through' "their site taking advantage of their faster Internet connection " caching and most importantly compression (of the file sent to your browser) in return for their adverts added to the viewed pages. "Such accesses give the webwarper URL as the User Agent concealing" the true agent of the original user. More details at*


InterGO "This was a child-safe browser nut it seems no associated" page remains

InternetArchive "Presumably but that's in 'stealth mode'"

Internet Ninja (Japanese Macintosh browser?)

InternetSeer A web monitoring service. More details at

ipiumBot (French) A tool that searches for copies of your documents on the web. Crawls from

InternetAmi IOR robot gathering data for an English/Swedish translation service.

InsumaScout/ Searches data situated in open data sources.

Katriona Something to do with the European Regional Internet Registry (RIPE) Browses using IP address


LEIA Unable to find (Too many 'Star Wars' references get in the way)


LimeBot Robot searching for information on cruises. Browses using IP address


Mata Hari (Internet search agent)

metabot Geographical-based text search tool. Crawls from

Mister Pix II Picture finder

MOSES 2.0 Spider NOTE Site crashes my version of netscape 4.7

MonkeyCrawl 'Futuristic play'.

NetCruiser "It's not clear to me which of these products this might be " but I'm assuming it's one of them.

NPBot crawls from ( A trademark protection service


NutchCVS Open source web-search project

NZBot Offers 'information management' tools

Opencola "A search application combining data from multiple sources"

ORA_checksite Identifier used in a sample perl program in the online book 'Web Client Programming with Perl'. The program is "used to check links. Obviously people have tried it and it works " PAD File Get. PAD file poller. PAD files describe software applications to download sites.

Oxxbot1 (Data mining bot on IP

Pansophica A Web search agent with neural net intelligence which organizes and personalizes Web sites and searches.

Phoaks An index or web resources listed in UseNet. See also

phpMySearch-Crawler a search engine for individual sites.

PICgrabber A free picture and movie locator


erik(at) Seems to be a project to create a collage of images gathered from the Internet.

PicSpider (German). Site offers a 'picture crate' "according to babelfish which seems to be some form of" "repository. Not sure why it's spidering but crawls" from 217-20-118-26 which is part of

PintaSpider Unable to find But the spider came from

Pita (Chub.Stanford.EDU) --

PitSpyder Thread#n#0 Unable to find

psbot A bot indexing pictures. Crawls from

PolyBot crawls from " " " "

PureSight (child-safe content filtering)

Rumours-Agent "Comes from IP which a lookup" identifies as 'Cross Lingual Info Research' in Japan.

RepoMonkey Bait & Tackle A bit of detective work here. Recent entries in the "the log file link this to the site " although the robot always appears to come from an IP address at (a bookmarking service). Visiting reveals a 'coming soon' site. Looking at the HTML source leads to another page at (appears identical). The META tags for this page all appear to be references "to day trading futures training and the like although" we did spot the word 'fibonacci' (our favourite . So... possibly a future search engine related to stock "trading? or maybe the Monkey and Hippo are just feeding" me a red herring? There's more. The picture on the Kenjin site at is currently the same as that at HungryHippo. Kenjin is an Autonomy company.

Robot2.0(PingSoft) "There are several 'PingSoft's around but I suspect that" this belongs to one of the products listed at (e.g. SmartHunter) since I was visited froma Chinese IP address.

SilentSurf A surf anonymizer service

SlySearch A site that hunts down infringements

slysearch(at) of intellectual property rights.

SpaceBison "A web filter that is 'ShonenWare' i.e. you should" purchase a Shonen Knife CD if you use it. Shonen Knife "are a great Japanese band much loved by the late Kurt" Cobain. Sometimes this sets the referrer page to the band's home page at (or maybe the users just happen to go there themselves).

CrawlWave " (Greek and requires login)" "Crawls from which is part of the" Athens University of Economics and Business (

SpotOn (IE add-on that organizes your browsing)

SQ Webscanner (on holiday last time I looked)

Squid An open-source web proxy cache for Unix systems

SquidClamAV_Redirector An open-source anti-virus program that I saw accessing icons on my site (!)

Sqworm Not 100% sure about this one. When it visited me it came from the WebSense site 63.212.171.* (and a Google search show others seem to see the same). At the WebSense site you "can find WebCatcher a product used to monitor" employees web-surfing habits (as near as I can tell). "But as I say I'm not 100% sure..."

Steganos Internet Anonym A surf anonymizer utility

SurfControl content tracking product

Tagword Tool that surveys the links in the Open Directory "at checking their status etc." See

TaWWWantula Unable to find

Tcl http client package The default identifier for any software built using the Tcl HTTP package

TeraCrawl Unable to find

TurnitinBot Plagarism prevention system. Crawls from

UCmore A broswer plug-in (initially IE only) that searches for related pages and categories. In my experience this seems to entail accessing a favicon.ico file on a daily basis (presumably to refresh the 'favorites' list)

UdmSearch "Search engine technology as used at sites such as" Now called mnoGoSearch.

unchaos_crawler A search engine that offers a 'hybrid' "of human and machine intelligence but no search box" that I could see . Crawls from

unlostBot is 'under construction'. The robot came

unlostBot(at) from IP address which is in France.

URLBlaze File/web search utility

utopy Coming soon at (requires flash). This

crawler(at) venture-capital funded site is 'running in stealth mode' before launching the 'new new thing' (is that a typo?). "One of the Flash pages defines Utopia (geddit?) and some" of the browsing is done by IP addresses at

UtilMind HTTPGet A component intended for downloading pages from the web using standard Microsoft Windows Internet library (winInet.dll) Listed on

UrlScope Unable to find

Vagabondo Appears to be a log analyzer for Russian BBS systems. (I may have got that wrong). I found reference to "it being copyright John Gladkih 1998 but I've not found" any URL that gives a description (not even a Russian one).

VCI WebViewer "Web browser object that may be incorporated into software"

vspider A commercial spidering product.

WAVETools A set of Delphi components offered to build Internet applications from

Webbandit Collates search engine results News-gathering agent

webcollage Forms collage from randomly select web images pet project of one of the authors of Netscape. Seems to come from differing IP nodes.

WebCompass (quarterdeck search engine software)

WebGenie presumably one of the CGI-based products available on this site. Possibly the 'Site Sleuth'

Web Hound Unable to find "Or rather I found several different 'web hounds' so can't tell" "which this was "

Web Magnet this appears to be a tool used by this web consultancy.

WebMiner Either or A tool to track down and target visitors to your website

WebPix Tool to fetch all pictures from a web site


WebSymmetrix "Originates in Korea and is possibly related to their" National Computerization Agency. Uses IP address

webrank Search engine popularity meter.

webwasher (browser filter)

WhosTalking Software that tracks Trademark usage last time I saw it it was creating 404 errors by adding &dg.. to each URL. Hopefully they'll fix this (German). Appears to be an interpreter designed to help automate regular tasks on a Windows PC.

XupiterToolbar A toolbar that sets up as the default search engine. There appears to be a lot of negative press regarding this toolbar

yacy An open source and distributed search engine project. The above URL seems to redirect to an IP-based one

YottaShopping_Bot www-yottashopping-com/. User arent clains this is a "Shopping Search Engine but the URL requires a login" so I was unable to verify (so I deliberately made it's URL non-clickable). Crawled from

Sites that regularly visit

"Some IP addresses or sites may regularly visit you although the user agent may be obscure blank or even change. "

Here are a few that I've been able to work out

Site address(es) Description This is a site thet offers a speed-up "to your surfing in return for being able to" monitoring people's surfing habits. The speed-ups "are acheived through a variety of techniques " "and the monitoring info is sold on although your" privacy is protected. Visit for more details. Not known This site daily reads any xml files submitted to a shareware site in PAD format. PAD is a means for describing shareware devised by the Association of Shareware Professionals ( This site "is performing daily checks looking to automatically" update its lists with any changes.

Other useful sites

Here are links to other sites you might find useful when looking into web robots "A Bot monitor site with regular updates and links to" the bot's home pages. A list of HTML validators A site that lists IP addresses of search engine bots and others. More comprehensive (and probably more up to date) that the IP addresses shown on this page (which tends to record the first IP address seen) An online syntax checker for robots.txt files. Enter the URL of your robots.txt file to get it checked and to see a summary of what effect it will have. Mozilla web browser project. This page describes the conventions used for formatting the User Agent in the form 'Mozilla...' A site dedicated to the robots.txt file. This page "gives some background to how robots work although" there list of robots is quite small. A page collecting together a number of resources to do with all aspects of web robots. A site primarily about 'cloaking' sites the art of making a site look different to different visitors. Contains articles on how to detect spiders. A site listing WAP user agent strings. These will mostly be mobile phones This site contains a number of forums for topics of interest to webmasters everywhere. This particular forum actively discusses robots and search engines that visit your site.

"...And finally some fakers "

Increasingly security and privacy concerns mean that users and companies are wary about giving away information to sites they visit through the user agent and other fields that appear in server logs.

"Some browsers will allow you to select the user agent you present when visiting a site. The Opera browser does this for example to allow it's users to pretend to be either IE or Netscape when visiting web sites coded in a way that forgets there are other browsers in use. "

"Also as firewalls become more common we will see more and more user agent fields beling blocked by the firewall that will prevent this information being transmitted to the outside world. "

"Just to prove that you can never rely on the user agent here is a selection of user agent strings I've seen in my log files that tell us nothing about the software being used (although some of them speak volumes about the person driving the software). I'm omitting any IP addresses I may have to protect the identities of those concerned "

user agent' seen Comments

Bruciebot I'm assured this was created by a regular in alt.webmasters

Blocked by Norton The agent has been blocked

Geblokkeerd door Norton by Norton Utilities. The refferrer

Blockeriet von Norton is also withheld. The second version is Dutch. No doubt other languages occur

Don't Like AOL Oh dear. This could start a trend!

Don't be so nosey "Hey! you came to my site first remember? "

Don't you wish you knew. Obviously.

Go Away A bit rich from someone who came to my site!

Field blocked by AtGuard Surfer is behind the AtGuard firewall (now part of Norton Internet Security 2000) which prevents the true User Agent being transmitted.

Field blocked by Outpost Again field is witheld by the software

Isch habe gar kein Browser German for 'I have no browser' "Or so I thought until I received the following" from Clemens Marschner Actually it is German with Italian accent! The word refers to an advertisement of the Nescafe "coffee where a smart Italian convinces a beautiful" lady to stay and drink coffee with her after she knocks at his door to complain that his car is in the way of hers. And after she stayed and listened to him while he prepares the coffee with lots of gestures "and Italian speak she again asks him to move his car " "and he goes 'Isch 'abe gar keine Auto Signorina' (I" "don't even have a car signorina). Since that" "commercial was shown for years presumably all German" web masters know it...

My Web browser is not of your business "True but no fun."

multiBlocker browser Although this seems to mainly offer protection against visitor "to your site they obviously also provide a" user agent blocker for people browsing

Wabbit's don't use browsers Probably the proxy service at

"Wot no browser? (Win67; X; SK) " Win67 ?!? Ah... a dream come true!

Who gives a ? It's as least as good as Lynx "Ah yes but how do we know that?"

Who wants to know? I do.

Awards for this page

I've been told this page is referenced in the book Spidering Hacks

All awards gratefully received

"This page is 2000-2005 John A Fotheringham. It may not be reproduced without permission "

although you are welcome to save a copy for personal use to your hard disk.

The original page is located at