Search engine robots that visit your web site by John A Fotheringham.
List of robots and where they come from, thanks to research by John Fotheringham!
Search engine robots that visit your web site by John A Fotheringham.
abacho.com AbachoBOT srv-ze-robot1.tricus.com
abcdatos.com abcdatos_botlink 184.108.40.206 abcdatos.com/botlink/
aesop.com AESOP_com_SpiderMan 220.127.116.11
ah-ha.com ah-ha.com crawler (crawler(at)ah-ha.com) c7pub-216-250-141-186.center7.com
alexa.com ia_archiver green.alexa.com sarah.alexa.com
altavista.com Scooter test-scooter.pa.alta-vista.net Mercator brillo.pa.alta-vista.net Scooter2_Mercator_3-1.0 av-dev4.pa.alta-vista.net roach.smo.av.com-1.0 scooter.aveurope.co.uk Tv#nn#_Merc_resh_26_1_D-1.0 bigip1-snat.sv.av.com mercator.pa-x.dec.com scooter.pa.alta-vista.net election2000crawl-complaints-to-admin.webresearch.pa-x.dec.com scooter.sv.av.com avfwclient.sv.av.com tv#nn#.sv.av.com
altavista.co.uk AltaVista-Intranet host-119.altavista.se jan.gelin(at)av.com
alltheweb.com FAST-WebCrawler 18.104.22.168 crawler(at)fast.no fast.no/faq/faqfastwebsearch/faqfastwebcrawler.html Wget ext-gw.trd.fast.no
acoon.de Acoon Robot 22.214.171.124
antisearch.net antibot 126.96.36.199
atomz.com Atomz router-sc.atomz.com index.atomz.com
axmo.com AxmoRobot 188.8.131.52
buscaplus.com Buscaplus Robi buscaplus.com/robi/
canseek.ca CanSeek/ 184.108.40.206 support(at)canseek.ca
christcrawler.com/search.cfm ChristCRAWLER 220.127.116.11 christcrawler.com/
clush.com Clushbot 18.104.22.168 clush.com/bot.html
crawler.de Crawler crawlit.crawler.de admin(at)crawler.de
daadle.com DaAdLe.com ROBOT/ 22.214.171.124
daum.net RaBot 126.96.36.199 Agent-admin/ phortse(at)hanmail.net 188.8.131.52 contact/jylee(at)kies.co.kr RaBot 184.108.40.206 Agent-admin/ webmaster(at)kisco.go.kr
en.deepindex.com DeepIndex deepindex.net1.nerim.net
ditto.com DittoSpyder 220.127.116.11
earthcom.info EARTHCOM.info 18.104.22.168
entireweb.com Speedy Spider 22.214.171.124
excite.com ArchitextSpider Musical instrumentss are used in the name such as viola.excite.com cello.excite.com piano.excite.com kazoo.excite.com ride.excite.com sabian.excite.com sax.excite.com bugle.excite.com snare.excite.com ziljian.excite.com bongos.excite.com maturana.excite.com mandolin.excite.com piccolo.excite.com kettle.excite.com ichiban.excite.com (and the rest of the band) more recently first names are being used like philip.excite.com peter.excite.con perdita.excite.com macduff.excite.com agouti.excite.com
(excite) ArchitectSpider crimpshrine.atext.com ichiban.atext.com
eurip.com EuripBot 126.96.36.199
euroseek.net Arachnoidea 188.8.131.52 arachnoidea(at)euroseek.net
ezresults.com EZResult 184.108.40.206
fastsearch.net Fast PartnerSite Crawler psprdcrw001.sac2.fastsearch.net FAST Data Search Crawler 220.127.116.11 FAST Data Search Document Retriever 18.104.22.168
fireball.de KIT-Fireball ?
france.misesajour.com/ france.misesajour.com 22.214.171.124
fybersearch.com FyberSearch 126.96.36.199
galaxy.com GalaxyBot 188.8.131.52 galaxy.com/galaxybot.html
geckobot.com geckobot .rdc1.az.coxatwork.com
gendoor.com GenCrawler ?
(Genealogical Search Engine)
geona.com GeonaBot 184.108.40.206
getrax.com getRAX 220.127.116.11
google.com Googlebot c#nn#.googlebot.com googlebot(at)googlebot.com googlebot.com/
goo.ne.jp moget/2.0 18.104.22.168 moget(at)goo.ne.jp
girafa.com Aranha Aranha.girafa.com
(inktomi) Slurp.so/1.0 q2004.inktomisearch.com slurp(at)inktomi.com j5006.inktomisearch.com
(inktomi) Slurp/2.0j 22.214.171.124 slurp(at)inktomi.com goo313.goo.ne.jp inktomisearch.com
(inktomi) Slurp/2.0-KiteHourly y400.inktomi.com slurp(at)inktomi.com; inktomi.com/slurp.html
(inktomi) Slurp/2.0-OwlWeekly 126.96.36.199 spider(at)aeneid.com inktomi.com/slurp.html
(inktomi) Slurp/3.0-AU j6000.inktomi.com slurp(at)inktomi.com
hoppa.com/ Toutatis 2.5-2 tisnix.xs4all.nl
(need V5 browsers to view)
hubat.com Hubater 188.8.131.52
almaden.ibm.com almaden.ibm.com/cs/crawler wfp2.almaden.ibm.com
iltrovatore.it IlTrovatore-Setaccio 184.108.40.206
incywincy.com IncyWincy 220.127.116.11
infoseek.com UltraSeek cde2c923.infoseek.com InfoSeek Sidewinder cde2c91f.infoseek.com cca26215.infoseek.com
intags.de Mole2/1.0 18.104.22.168 webmaster(at)intags.de
mp3bot.de/ MP3Bot #..#
ip3000.com C-PBWF-ip3000.com-crawler ip3000.com ip3000.com-crawler
istarthere.com istarthere.com 22.214.171.124 spider(at)istarthere.com
knowledge.com Knowledge.com/ 126.96.36.199
kuloko.com kuloko-bot/0.2 188.8.131.52
lexis-nexis.com LNSpiderguy firewall5.lexis-nexis.com
linknz.co.nz Linknzbot 184.108.40.206
look.com lookbot magma.com
looksmart.com MantraAgent fjupiter.looksmart.com
loopimprovements.com NetResearchServer leg-64-133-109-250-STK.sprinthome.com
(see also incywincy.com) loopimprovements.com/robot.html
lycos.com Lycos_Spider_(T-Rex) bos-spider#n#.bos.lycos.com 220.127.116.11
joocer.com JoocerBot 18.104.22.168
mirago.co.uk HenryTheMiragoRobot 22.214.171.124
mozdex.com mozDex/ (within comcast.net)
search.msn.com/ MSNBOT/0.1 126.96.36.199 search.msn.com/msnbot.htm)
navadoo.com Navadoo Crawler
northernlight.com Gulliver marvin.northernlight.com taz.northernlight.com
objectssearch.com ObjectsSearch/0.01 188.8.131.52
picosearch.com PicoSearch/ pipe.picosearch.com
portaljuice.com PJspider timber.nextopia.com
powerinter.net DIIbot node-d8e93393.powerinter.net
but it won't let us in
navi.ocn.ne.jp/ nttdirectory_robot lilis00.navi.ocn.ne.jp super-robot(at)super.navi.ocn.ne.jp lilis04.navi.ocn.ne.jp griffon griffon(at)super.navi.ocn.ne.jp
maxbot.com Spider/maxbot.com search.wport.com admin(at)maxbot.com
various (fakes agent on each access) pool0058.cvx2-bradley.dialup.earthlink.net
gazz/1.0 deleuze.infobee.ne.jp gazz(at)nttrd.com derrida.infobee.ne.jp
nationaldirectory.com NationalDirectory-SuperSpider spider.nationaldirectory.com 184.108.40.206
naver.com dloader(NaverRobot)/ 220.127.116.11 dumrobo(NaverRobot)/
noxtrum.com noxtrumbot/ 18.104.22.168
openfind.com "Openfind piranha Shark"
(Chinese language) robot-response(at)openfind.com.tw Openbot/ abovenet4.openfind.com
picsearch.org psbot 22.214.171.124 picsearch.org/bot.html
pinpoint.com CrawlerBoy Pinpoint.com nitrogen.pinpoint.com
petersnews.com user#n#.ip3000.com news#n#.petersnews.com
qweery.nl QweeryBot 126.96.36.199 qweerybot.qweery.com)
vestris.com/alkaline AlkalineBOT host130.uv-ray.com
rambler.ru StackRambler/ 188.8.131.52
seznam.cz SeznamBot 184.108.40.206
search-10.com Search-10 220.127.116.11
searchhippo.com Fluffy the spider 18.104.22.168 info(at)searchhippo.com)
scrubtheweb.com Scrubby/ 22.214.171.124
singingfish.com asterias grouper.singingfish.com
speedfind.de speedfind ramBot xtreme BWEB.highway.telekom.at
s.u-tokyo.ac.jp Kototoi/0.1 crawler-red3.is.s.u-tokyo.ac.jp
searchspider.com Searchspider/ 126.96.36.199
sightquest.com SightQuestBot/ 188.8.131.52 sightquest.com/bot.htm
spidermonkey.ca Spider_Monkey/ 184.108.40.206
surfnomore.com Surfnomore Spider v1.1 220.127.116.11
supersnooper.com Robot(at)SuperSnooper.Com 18.104.22.168
teoma.com teoma_agent1 22.214.171.124 teoma_admin(at)hawkholdings.com
mapper.teradex.com Teradex_Mapper 126.96.36.199 mapper(at)teradex.com
travel-finder.com ESISmartSpider 188.8.131.52
traficdublu.ro Spider TraficDublu "81.196.*.* 184.108.40.206"
tutorgig.com Tutorial Crawler 220.127.116.11 tutorgig.com/crawler
updated.com updated/0.1beta 18.104.22.168 crawler(at)updated.com
uksearcher.co.uk UK Searcher Spider -
vivante.com Vivante Link Checker 22.214.171.124
walhello.com appie "uses an address at planet.nl a Dutch ISP"
websmostlinked.com Nazilla -
webwombat.com.au WebWombat.com.au 126.96.36.199
webseek.de marvin/infoseek arthur4.sda.t-online.de marvin-team(at)webseek.de
webtop.com MuscatFerret ferret#nn#.webtop.com
whizbanglabs.com WhizBang! Lab 188.8.131.52
wisenut.com ZyBorg - (info(at)WISEnut.com)
wire.co.uk WIRE WebRefiner: brighton.wire.co.uk webrefiner(at)wire.co.uk
yandex.com Yandex ya.yandex.ru
yellowpet.com Yellopet-Spider 212-82-36-23.ip.zeitraum.com
pet-based search engine
yelo.no Findexa Crawler
yourbettersearch.com YBSbot search engine indexer 184.108.40.206
#client sites# libwww-perl linpro.no/lwp/
verno.ueda.info.waseda.ac.jp/ Iron33 220.127.116.11
Most browsers identify themselves with a string that begins 'Mozilla...'. I've chosen not to document those (as yet). Here are a few of the rarer browser identifiers that I've seen.
Browser identifier Information
AmigaVoyager v3.vapor.com/ Voyager browser for the Amiga
xChaos_Arachne browser.arachne.cz/ (DOS-compatible browser. Linux version under development)
IBrowse hisoft.co.uk (search for IBrowse) Amiga-based browser
ICab icab.de/index.html (Macintosh-only)
JustView www3.justsystem.co.jp/download/justview/3.01win1a.html (I think this is a browser. Site is in Japanese)
KMeleon kmeleon.sourceforge.net/ (Light browser based on the Mozilla code base)
Konqueror konqueror.org/konq-browser.html (Linux KDE browser)
Lynx lynx.browser.org/ (Cross-platform text based browser)
OmniWeb omnigroup.com/products/omniweb/ (Macintosh-only)
Opera opera.com "(Cross-platform small efficient and standards lead browser)"
Plucker plkr.org/index.pl/faq#1.1 (Palm handhelds. Written in Python)
pwWebSpeak prodworks.com/issound/catalog/catalog_pwwebspeak.html Audio Browser
QWeb sunsite.auc.dk/qweb/ (Linux browser) (see also browswerwatch.internet.com/news/story/qweb8.html)
retawq retawq.sourceforge.net/ Text-based browser for text terminals. Runs under Linux
SlimBrowser flashpeak.com/sbrowser/sbrowser.htm Freeware tabbed browser
Sleipnir sleipnir.pos.to/software/sleipnir/index.html (Japanese) Japanese browser with apparantly an English version available.
VMS_Mosaic vaxa.wvnet.edu/vmswww/vms_mosaic.html "(OpenVMS only version of Mosaic a pre-Netscape browser)"
WannaBe mindstory.com/wb2/ (Macintosh text-only browser)
w3m w3m.sourceforge.net/ (text-based browser)
"Link Checkers Link monitors and bookmark managers "
"Link checkers and bookmark managers are run by people wanting to keep their pages and bookmarks up to date. Being visited by a link checker is good news as it means that someone has linked to you and cares that you're still alive. Link monitors regularly check your pages for changes usually because someone has selected your page as 'one to watch'. "
(pause for warm glow
"If you have access to the server log check the referrer page to try and get the URL from which you are linked. Sometimes these URLs are inside password protected parts of sites so you won't be able to view the page. "
"If you build up a list of sites that link to you these are the guys you should tell when you move (moral never move) "
"It's also quite common for the Link checker to give no indication of which URL it's coming from. Some link checkers always come from the same IP address more usually they come from the client's site. It depends on whether the site owner has purchased a copy of the link checking software or signed up to some centralized link checking service. If you get the client's IP address you can always try visiting that if they blank the referrer URL field and surfing their site. "
Some of these tools appear to imply they're extracting email addresses (e.g. emailSiphon). As such they're probably unwelcome visitors since these addresses are probably being collected for spammers.
A page listing various link checkers (and other tools) can be found at softwareqatest.com/qatweb1.html#LINK
Robot identifier IP address(es) Link Checker home page
ActiveBookmark #client site# libmaster.com/software.php
ALink #client site# info-pack.com/alink/ "Reciprocal Link Checker Manager and Page Generator."
AMeta #client site# info-pack.com/ameta/ Meta Tag Generator
ASPSearch URL Checker #client site# search.santry.com/downloads/ a site search engine/index maintenance tool
BlogBot #client site# sourceforge.net/projects/blogbot/
BMChecker #client site# fureai.or.jp/~yoichi37/soft/bmchecker.html (Japanese Bookmark Checker)
Bookmark Buddy #client site# bookmarkbuddy.net/about.shtml
Check&Get #client site# checkget.com
CheckWeb #client site# checkweb.com
CNET_Snoop download.com (only if you have software listed at that site)
CSE HTML Validator #client site# htmlvalidator.com HTML page validator that includes a link checker amongst it's functions.
DRKSpider #client site# drk.com.ar/spider/ (An Open Source project)
DISCo Watchman #client site# t-guild.com/gamesite/Software/Disco_w/Disco_w.htm
DoctorHTML draco.imagiware.com www2.imagiware.com/RxHTML/
Email Extractor #client site# #email collector# We don't list links to email collectors on this site
EmailSiphon #client site# #email collector# We don't list links to email collectors on this site
EmailWolf #client site# pixeltech.com.au/~msw/ewolf/index.html
FavOrg #client site# "pcmag.com/article2/0 1759 1558477 00.asp" A utility written by PC Magazine to fetch icons files (favicon.ico) for your IE favorites
Favorites Sweeper #client site# manitoolssoftware.cjb.net Another 'favorites' tidy-up utility
FreshLinks.exe #client site# resqpc.com/features.html
Funnel Web Profiler #client site# quest.com/funnel_web/profiler/ "Profiles your site including links to/from it"
Html Link Validator #client site# lithopssoft.com/hlv/index.html
HTMLParser #client site# htmlparser.sourceforge.net/ an open source "HTML parser that is probably exercising it's" link-checking features.
The Informant cosmo.dartmouth.edu informant.dartmouth.edu/
InternetLinkAgent #client site# www1.odn.ne.jp/freeware/rank/ineternet/internetlinkagent.html (in Japanese)
InternetPeriscope #client site# lokboxsoftware.com/internetperiscope.asp
javElink salix.ingetech.com dailydiffs.com
jdwhatsnew.cgi #client site# jdrowell.com/projects/jdwhatsnew/view
JRTS Check Favorites Utility #client site# jrtwine.com/Products/CheckFavs/
Lambda LinkCheck 18.104.22.168 stud.ifi.uio.no/~lmariusg/download/python/LinkCheck.html
LinkLint-checkonly -- goldwarp.com/bowlin/linklint/
LinkAlarm linkalarm.com linkalarm.com
Linkbot #client site# tetranetsoftware.com/products/linkbot.htm
Linkman (Mozilla...) 22.214.171.124 outertech.com/product.php?product=5
LinkProver #client site# tafweb.com/linkprover.html
Links -- gossamer-threads.com/scripts/links/ (Link management cgi script)
LinkScan Server #client site# elsop.com
LinkSweeper #client site# lss.com.au/lss/windows/ls/linksweeper.htm
Link Valet Online 126.96.36.199 htmlhelp.com/tools/valet/
LinkVerify Spider frances.yourwebhost.com enduser.co.uk/linkverify/
LinkWalker lw.seventwentyfour.com seventwentyfour.com 188.8.131.52
Morning Paper #client site# boutell.com/morning/
MoveAnnouncer -- moveannouncer.com (notifies webmasters when your pages have moved)
mylinkcheck -- mylinkcheck.de (German)
NetLookout -- frugalsoft.com
NetMechanic gamma.netmechanic2.com netmechanic.com
NetMind-Minder marvin.netmind.com (retired) netmind.com gary.netmind.com meg.netmind.com inyanga.netmind.com leo.netmind.com gemini.netmind.com
NetMonitor -- modemwizard.com/netmonitor.html
Netprospector JavaCrawler #client site# actaddons.com/products/netprospector.asp
online link validator 184.108.40.206 dead-links.com (online link checker submit your URL)
Rational SiteCheck #client site# rational.com/products/teamtest/prodinfo/sitecheck.jtmpl
Robozilla h-206-#n#-#n#-#n#.netscape.com dmoz.org/ (checks links in the dmoz directory)
RPT-HTTPClient #client site# purplefrog.com/~thoth/jchecklinks/ Java utility that uses the Java HTTPClient class library
SiteBar #client site# sitebar.org
SpurlBot spurl.net Online bookmark agent
SurfMaster #client site# maskbit.com/surfmaster.htm
SyncIT #client site# bookmarksync.com
Watchfire WebXM #client site# watchfire.com/products/webxm.asp
WatzNew Agent #client site# watznew.com
WebSite-Watcher #client site# aignes.com
WebTrends Link Analyzer #client site# webtrends.com
Weblink Scanner #client site# iterix.com/products/WeblinkScanner/weblinkScanner.asp
Xenu's Link Sleuth #client site# snafu.de/~tilman/xenulink.html
Z-Add Link Checker #client site?# w3.z-add.co.uk/linkcheck/
"Validators check your web pages for HTML correctness and standards compliance. Since other people are unlikely to send a validator to your site you don't usually see much of this. Consequently the 'list' below is restricted to the on-line validators I've used myself. "
"However if you choose to validate your own site then the validation attempts will appear in your logs. The following list is thus limited to the on-line validator I use (and recommend) and a URL submission service that I use. "
Robot Identifier IP address Validator home page
W3C_Validator abyss.w3.org validator.w3.org/
WDG_Validator/ 220.127.116.11 htmlhelp.com/tools/validator/
Tooter selfpromotion.com selfpromotion.com. This is used as part of a link submission agent (trebor(at)animeigo.com)
FTP clients and download managers
"If you offer files for download then you'll start to be visited by various FTP clients. Clients like Go!Zilla and GetRight are smart in that they can resume downloads that have been interrupted. This relies on your web server supporting the necessary protocol but that's fairly standard these days. "
"If your download files are over 1Mb in size (or if your server is slow) you'll often see the same IP address make multiple partial downloads of your file (look at the file size). In the case of Clients line Go!Zilla and GetRight if these add up to the right number of bytes then chances are the download succeeded. "
Client Identifier FTP Client home page
ChinaClaw download.pchome.net/internet/download/860.html (Chinese) (Chinese download utility)
DA lidan.com downloadaccelerator.com
DLExpert yanew.com (English and Chinese versions available)
Download Demon netzip.com
Download Master one.com.ua/dm/ (Russian)
Download Ninja h-fd.org/~mkro/mt/archives/000585.html (Japanese)
Download Wonder forty.com
Ez Auto Downloader anatari.com/ezad/index.html "Downloads all files of a given type from a site so it's" more like a site grabber
JetCar (or FlashGet) amazesoft.com
Kontiki Client kontiki.com/products/index.html
Mass Downloader geocities.com/SiliconValley/Vista/2865/md.htm
MetaProducts Download Express metaproducts.com/DE.html
NetZip Downloader netzip.com
Net Vampire netvampire.com
Nitro Downloader klsofttools.com/nitro.html
SpeedDownload yazsoft.com (for Macintosh)
WebDownloader for X 1.30 krasu.ru/soft/chuchelo/features.php3 (Linux web downloader with X GUI)
WebLeacher webleacher.dk (down last time I tried it) more details at davecentral.com/projects/thewebleacher/
WebPictures Downloader fullstrong.com Locates and downloads pictures
X-Uploader "Can't find the home page but it's described (in Russian)" on compulenta.ru/2002/1/17/24333/
These agents come from research projects. Of course that's how Google started...
citenikbot/ citenik.co.uk/bot.html. One-man project due for release in 2004.
CLIPS-index clips-index.imag.fr/ (French) French research robot from a linguistics project (?)
Computer_and_Automation_Research_Institute_Crawler Robot from the research centre at Hungarian Acedemy of Sciences at sztaki.hu Crawls from IP 18.104.22.168
cosmos Spider from xyleme.com which is a project to locate
robot(at)xyleme.com and index XML content on the web. The company is a spin off "from project at INRIA in France a frequent source of" web robots. The word 'xyleme' apparantly relates to the "vascular system in plants but cleverly must be one of" "the very few words to contain the letters 'X' 'M' and 'L'" (although not in that order
D2KWebCrawler archive.ncsa.uiuc.edu/TechFocus/Projects/NCSA/D2K-_Data_To_Knowledge.html Data to Knowledge' data miner. Crawls from 22.214.171.124
DiaGem/ Experimental spider from Mitsibushi R&D division skyrocket.gr.jp/diagem.html Crawls from IP 126.96.36.199
Digimarc WebReader Digimarc search images on the web looking for digital watermatrs More details at digimarc.com
EchO!/2.0 "Spiders from 188.8.131.52 which would seem to be part" "of voila.com a French-based search engine."
FinaleRobot The expressus.com site describes an Interactive Natural
robot-master(at)expressus.com Language encyclopedia that will become a search engine "at final-e.com. Good name but at present it just" maps back onto the ExpressUs site (not such a good name). Crawls from IP address 184.108.40.206
Ideare SignSite ideare.com. Spiders from spider3.tiscalinet.it. Ideare are "a research company producing search engine technology and are" "part owned by Tiscali in Italy who seem to use their various" "tools for different search engines (mp3 images etc)."
GentleSpider Some sort of spider that usually visits using an IP address from within research.att.com or crawler.tivra.com
Gulper Web Bot ecsl.cs.sunysb.edu/~maxim/cgi-bin/Link/GulperBot (Open research project to produce opinion-based search engine)
larbin "And from the people that brought you xyro (see below) "
sebastien.ailleret(at)inria.fr "comes another newer bot. This one seems to crawl from"
ghi(at)lcs.mit.edu the IP address cremant.inria.fr. Update more recently it's also been seen coming from barracutta.lcs.mit.edu
cosmos "And then there was 'cosmos' crawling from pomelos.inria.fr" Seems these people are a webbot factory. Cosmos doesn't offer an email address.
IRLbot irl.cs.tamu.edu/crawler. Crawls from 220.127.116.11 crawls randomly to determine the topology of the web.
KnowItAll cs.washington.edu/research/knowitall/ a project that extracts massive amounts of information from the Web in "an autonomous scalable manner'. Don't they know that" everyone hates a know-it-all?
MJ12bot majestic12.co.uk/projects/dsearch/ A dsitributed search engine project
MultiText Research project to index the last weeks' news items canola1.uwaterloo.ca/
NEC Research Agent heavenly.nj.nec.com/ Research 'Inquirus' (meta?) search engine
OntoSpider ontospider.i-n.info Dutch robot for a research project. Crawls from 18.104.22.168
sherlock_spider sherlock.com.cn. A course project from burrocs.indiana.edu:15003/b659/ Crawls from 22.214.171.124
S.T.A.L.K.E.R. seo-tools.net/en/bot.aspx. 'My first robot' Crawls from 126.96.36.199
Steeler tkl.iis.u-tokyo.ac.jp/~crawler/crawler.html.en Japanese research robot.
ru-robot "Unable to find details on this but I'm guessing it's"
0.1_hseo(at)cs.rutgers.edu a research spider from rutgers.edu. Crawls using the IP teal.rutgers.edu
USyd-NLP-Spider it.usyd.edu.au/~vinci/webcorpus.html research into Natural "Language Processing at University of Sydney Australia"
WebGather pccms.pku.edu.cn:8000/ Chinese search project
xyro Seems to be a spider associated with a French
xcrawler(at)inria.fr research institute. Usually crawls using the IP address vamos.inria.fr
Zao/0.2 kototoi.org/zao/ Another Japanese research robot Crawls from 188.8.131.52.
Zao-Crawler "Same as above but crawled from 184.108.40.206"
These agents are the default identifiers for various software packages. Software developers uses these packages to add Internet functionality to their own applications. As such it's impossible to say without looking at the pattern of access what these agents are being used for as the same agent name may be used by different developers fo achieve differemt results.
"While many of these packages allow you to change the user agent some do not and many developers are too lazy to change the agent string. "
GT::WWW Apparantly some form of web-accessing perl module. Possible included in the Links SQL product produced by gossamer-threads.com/scripts/index.htm.
HTTPClient Default agent name used by the Java HTTPClient class. innovation.ch/java/HTTPClient/ (See also RPT-HTTPClient below)
HTTP::Lite Default identifier for a set of light-weight perl modules for retrieving web documents . See toybox.ca/http-lite/
IP*Works! Set of TCP/IP components used in cross-platform development of internet tools nsoftware.com/products/ipworks.aspx
libwww-perl The PERL programming language comes with a number of routines for constructing web-aware scripts. This and "related strings are the default user agent identifiers " although it's perfectly easy to change this to be whatever you want.
libghttp The GNOME http library. A Linux software library the offers connectivity to the web. Found in many places on the web. There is a description at fifi.org/doc/libghttp-dev/html/ghttp.html
Macromedia Flash Player Flash movies can contain scripts that can fetch content from the web (such as other Flash movies or images)
MFC_Tear_Sample Agent name used in the sample code supplied with Visual C++ for accessing the web. This may be therefore be someone running a program they've written based on that code.
PEAR HTTP_Request class TPEAR is a framework and distribution system for reusable PHP components pear.php.net/
Python-urllib Presumably the default identifier for the urllib module in the Python programming language lib.uchicago.edu/keith/courses/python/class/7/
RPT-HTTPClient The Java HTTPClient class library
TeamSoft WinInet Component winsoft.sk/wininet.htm (menus require Java) Internet software component suite
wget gnu.org/software/wget/wget.html Free Unix/Linux package for retrieving web pages
WinScripter iNet Tools winscripter.com/wsh/tools/wsInetTools.asp COM/DLL object that supports the SMTP and HTTP protocols
W3CRobot/ A fast web-spidering robot included with the libwww package (?). See w3.org/Robot/
W3C-WebCon/ w3.org/ComLine a command-line toolkit that allows you to perform HTTP operations
wxWidgets wxwidgets.org cross-platform open source C++ GUI builder "which includes 'HTML viewing' and much much more."
Zeus #nnnn# Webster Pro homepagesw.com/webster_overview.htm
Offline browsers and other agents
Agent Identifier Agent home page
EirGrabber www2p.biglobe.ne.jp/~eir/index.htm (Japanese software from the 'Eir Project')
ExtractorPro (Bulk email marketing tool. URL deliberately omitted)
FairAd Client hager.co.at/fordelka/fairad.htm (German) A German pay-to-surf client
JoBo matuschek.net/software/jobo/index.html a site downloader
iSiloWeb isilo.com (for palm pilot)
Kenjin Spider autonomy.com
MSIECrawler (Microsoft IE4.0)
NexTools WebAgent vector.co.jp/soft/win95/net/se053030.html
Offline Explorer metaproducts.com/OE.html
NetAttache Offline browser and search engine agent
PageDown Details (in Japanese) at www01.u-page.so-net.ne.jp/fa2/y_yutaka/share/pagedown.htm
Searchworks Spider nedesign.com/Phipps/products.html
SiteSnagger "pcmag.com/article2/0 1759 1559896 00.asp"
Teleport Pro tenmax.com/teleport/pro/home.htm
Web2Map web2map.com/us/index.htm Web site copier. English/German versions available
WebAuto yanasoft.co.jp/webauto.html I think this is an offline browser. Site is in Japanese
Webdup webdup.com (Chinese software. Not 100% sure what it does)
Website eXtractor asona.org
WebTwin WebTwin.com Convert websites into help files.
Xaldon WebSpider xaldon.de/produkte_webspider.html (German) Offline browser
Other miscellaneous agents
"These agents are ones that we've seen but been unable to get information for or which are slightly unusual in origin. If you have any additional information on any of these feel free to send it to info(at)jafsoft.com "
User Agent Information
Ad Muncher admuncher.com "Browser plug-in that monitors the pages as you view them " "and removes all adverts popup windows etc."
ADSARobot distributed search engine project Contact postmaster(at)cnds.ucd.ie browses from acropolis.ucd.ie (which doesn't make sense for a distributed search engine
Albert Indexer albert.com Multi-lingual search technology
AnswerChase answerchase.com a personal search robot.
ASPSeek aspseek.org/about.html. An open source search engine project
ATA-Translation-Service "Looks to be an online translation tool much like" Babelfish. Possibly related to atanet.org/
AVSearch Seems to be the AltaVista personal search agent. The crawling site is sometimes referred to in the agent name
Avant Browser avantbrowser.com Browser add-on for Internet Explorer
Beamer pagebeamer.org/fr/index.php (French). A browser accelerator that requires sites to create a 'pagebeamer.txt' file that is fetched by this agent to do predictive downloads.
beholder or vigiltech.com/esensedisclaim.html
BravoBrian bstop.bravobrian.it/ (may require IE). A content filtering service that offers protection from pornography and other unwanted content for children. Comes from IP 220.127.116.11
bumblebee(at)relevare.com Software used to build 'Vortals' (vertical portals). Details (requires Flash) can be found at relevare.com/site/
Checkbot Seems to come from oxxfordinfo.com who offer B2B services
contype Possibly Adobe Acrobat or Reader or Adobe Acrobat Reader used with MSIE (I have been unable to confirm this)
Convera Internet Spider A 'RetrievalWare' product which claims to be a multimedia web cralwer. convera.com/Products/rw_ancillis.asp
ConveraCrawler Probably related to the above
ccubee Crawler technology from empyreum.com/technologies/platforms/ccubee/
Custo Tool to map the structure of a web site netwu.com/custo/
CyberNavi_WebGet "UA points to cybertech-inc.co.jp but there's not" much there. It crawls from 18.104.22.168 which is bsearchtech.com/ (Japanese). Bablefish suggests this is a Japanese company offering search products
deepweb Also calls itself an 'Intelligent Deep-Web Robotic Agent' A search engine indexer that will index dynamic content. deepweb.com. Indexs from IP 22.214.171.124
EbiNess sourceforge.net/projects/ebiness An Open Source project to display Internet information ina 3D format.
EmailWolf pixeltech.com.au/~msw/ewolf/ email program no longer available that's the only reason I'm prepared to list it on this page.
Excalibur Internet Spider excalib.com/products/ispi/index.shtml
Expired Domain Sleuth "Hunts down popular yet expired domain names with" a view to letting you purchase an already popular domain name. expireddomainsleuth.com
Everest-Vulcan Inc./ everest.vulcan.com/crawlerhelp Next-generation services rechnology (under development)
Giskard oralco.com (Trivia note: Giskard is probably named after the Isaac Asimov robot)
grub-client "Grub is a distributed open source web crawler. Users" download the client which then indexes the web as part of a distibuted effort grub.org/html/documents.php
heritrix "Open-source extensible web crawler project" crawler.archive.org/
htdig htdig.org search engine software for companies and universities
webwarper.net A browser accelerator. The idea is that you browser 'through' "their site taking advantage of their faster Internet connection " caching and most importantly compression (of the file sent to your browser) in return for their adverts added to the viewed pages. "Such accesses give the webwarper URL as the User Agent concealing" the true agent of the original user. More details at webwarper.net/ww.pl/0/wwgz/about.htm?*
InterGO teachersoft.com browserwatch.internet.com/news/story/intergo1.html "This was a child-safe browser nut it seems no associated" page remains
InternetArchive "Presumably internetarchive.com but that's in 'stealth mode'"
Internet Ninja ifour.co.jp (Japanese Macintosh browser?)
InternetSeer A web monitoring service. More details at internetseer.com/
ipiumBot laurion.com/ipium-analysis.html (French) A tool that searches for copies of your documents on the web. Crawls from petula.laurion.net
InternetAmi IOR internetami.se/ior.html robot gathering data for an English/Swedish translation service.
InsumaScout/ insuma.de/insuma/de/SEscout.html Searches data situated in open data sources.
Katriona Something to do with the European Regional Internet Registry (RIPE) Browses using IP address 126.96.36.199
LEIA Unable to find (Too many 'Star Wars' references get in the way)
LimeBot cruiselime.com/LimeBot.php Robot searching for information on cruises. Browses using IP address 188.8.131.52
Mata Hari thewebtools.com (Internet search agent)
metabot Geographical-based text search tool. Crawls from 184.108.40.206 metacarta.com/products.htm
Mister Pix II Picture finder mister-pix.com/en/home.htm
MOSES 2.0 Spider ideas2internet.com/products/moses2/ NOTE Site crashes my version of netscape 4.7
MonkeyCrawl monkeymethods.org. 'Futuristic play'.
NetCruiser netcruiser-software.com/products.html "It's not clear to me which of these products this might be " but I'm assuming it's one of them.
NPBot nameprotect.com crawls from 220.127.116.11 (crawler1.crawler918.com) A trademark protection service
NutchCVS lucene.apache.org/nutch/bot.html. Open source web-search project
NZBot navigationzone.com Offers 'information management' tools
Opencola opencola.com "A search application combining data from multiple sources"
ORA_checksite oreilly.com/openbook/webclient/ch06.html Identifier used in a sample perl program in the online book 'Web Client Programming with Perl'. The program is "used to check links. Obviously people have tried it and it works "
Onekit.com PAD File Get. PAD file poller. PAD files describe software applications to download sites.
Oxxbot1 oxxfordinfo.com (Data mining bot on IP 18.104.22.168)
Pansophica homepage.mac.com/zigkit/Pansophica/index.html A Web search agent with neural net intelligence which organizes and personalizes Web sites and searches.
Phoaks phoaks.com/index.html. An index or web resources listed in UseNet. See also public.iastate.edu/~CYBERSTACKS/Aristotle.htm
phpMySearch-Crawler phpMySearch.web4.hm a search engine for individual sites.
PICgrabber A free picture and movie locator movies-free.net
erik(at)malfunction.org Seems to be a project to create a collage of images gathered from the Internet.
PicSpider bildkiste.de.vu (German). Site offers a 'picture crate' "according to babelfish which seems to be some form of" "repository. Not sure why it's spidering but crawls" from 217-20-118-26 which is part of internetserviceteam.com
PintaSpider Unable to find But the spider came from cnet.fr
Pita (Chub.Stanford.EDU) --
PitSpyder Thread#n#0 Unable to find
psbot picsearch.org/bot.html A bot indexing pictures. Crawls from ps.direct2internet.com
PolyBot cis.poly.edu/polybot/ crawls from "weasel.poly.edu " "grampus.poly.edu " bumblebee.poly.edu
PureSight puresight.com/Products/PureSightHomeDescription.htm (child-safe content filtering)
Rumours-Agent "Comes from IP 22.214.171.124 which a lookup" identifies as 'Cross Lingual Info Research' in Japan.
RepoMonkey Bait & Tackle A bit of detective work here. Recent entries in the "the log file link this to the site hungryhippo.com " although the robot always appears to come from an IP address at backflip.com (a bookmarking service). Visiting hungryhippo.com reveals a 'coming soon' site. Looking at the HTML source leads to another page at mezzaluna.net/hungryhippo.com/ (appears identical). The META tags for this page all appear to be references "to day trading futures training and the like although" we did spot the word 'fibonacci' (our favourite . So... possibly a future search engine related to stock "trading? or maybe the Monkey and Hippo are just feeding" me a red herring? There's more. The picture on the Kenjin site at kenjin.com/kenjin/info.html is currently the same as that at HungryHippo. Kenjin is an Autonomy company.
Robot2.0(PingSoft) "There are several 'PingSoft's around but I suspect that" this belongs to one of the products listed at pingsoft.net/ (e.g. SmartHunter) since I was visited froma Chinese IP address.
SilentSurf silentsurf.com. A surf anonymizer service
SlySearch slysearch.com. A site that hunts down infringements
slysearch(at)slysearch.com of intellectual property rights.
SpaceBison proxomitron.org/ "A web filter that is 'ShonenWare' i.e. you should" purchase a Shonen Knife CD if you use it. Shonen Knife "are a great Japanese band much loved by the late Kurt" Cobain. Sometimes this sets the referrer page to the band's home page at mmjp.or.jp/knife/ (or maybe the users just happen to go there themselves).
CrawlWave "spiderwave.aueb.gr (Greek and requires login)" "Crawls from 126.96.36.199 which is part of the" Athens University of Economics and Business (aueb.gr)
SpotOn spoton.com (IE add-on that organizes your browsing)
SQ Webscanner macinsearch.com/users/webscanner/ (on holiday last time I looked)
Squid squid-cache.org An open-source web proxy cache for Unix systems
SquidClamAV_Redirector freshmeat.net/projects/scavr/?branch_id=54042&release_id=188491 An open-source anti-virus program that I saw accessing icons on my site (!)
Sqworm Not 100% sure about this one. When it visited me it came from the WebSense site 63.212.171.* (and a Google search show others seem to see the same). At the WebSense site you "can find WebCatcher a product used to monitor" employees web-surfing habits (as near as I can tell). "But as I say I'm not 100% sure..." websense.com/products/about/webcatcher/index.cfm
Steganos Internet Anonym steganos.com/?layout=default&content=products_siapro&language=en A surf anonymizer utility
SurfControl surfcontrol.com/products/web/default.aspx content tracking product
Tagword Tool that surveys the links in the Open Directory "at dmoz.org checking their status etc." See tagword.com/dmoz_survey.php
TaWWWantula Unable to find
Tcl http client package The default identifier for any software built using the Tcl HTTP package tcl.activestate.com/software/tcltk/ tcl.activestate.com/man/tcl8.0/TclCmd/http.htm
TeraCrawl Unable to find
TurnitinBot turnitin.com Plagarism prevention system. Crawls from 188.8.131.52
UCmore ucmore.com A broswer plug-in (initially IE only) that searches for related pages and categories. In my experience this seems to entail accessing a favicon.ico file on a daily basis (presumably to refresh the 'favorites' list)
UdmSearch search.mnogo.ru/ "Search engine technology as used at sites such as" maplesearch.com. Now called mnoGoSearch.
unchaos_crawler unchaos.com. A search engine that offers a 'hybrid' "of human and machine intelligence but no search box" that I could see . Crawls from 184.108.40.206
unlostBot unlost.com is 'under construction'. The robot came
unlostBot(at)unlost.com from IP address 220.127.116.11 which is in France.
URLBlaze File/web search utility urlblaze.net
utopy Coming soon at utopy.com (requires flash). This
crawler(at)utopy.com venture-capital funded site is 'running in stealth mode' before launching the 'new new thing' (is that a typo?). "One of the Flash pages defines Utopia (geddit?) and some" of the browsing is done by IP addresses at ...myutopy.com.
UtilMind HTTPGet A component intended for downloading pages from the web using standard Microsoft Windows Internet library (winInet.dll) Listed on utilmind.com/delphi2.html
UrlScope Unable to find
Vagabondo Appears to be a log analyzer for Russian BBS systems. (I may have got that wrong). I found reference to "it being copyright John Gladkih 1998 but I've not found" any URL that gives a description (not even a Russian one).
VCI WebViewer "Web browser object that may be incorporated into software" homepagesw.com/webster_dl.htm
vspider verity.com/products/intspider/ A commercial spidering product.
WAVETools A set of Delphi components offered to build Internet applications from transerve.com
Webbandit softwaresolutions.net/webbandit/index.htm Collates search engine results
Webclipping.com Webclipping.com News-gathering agent
webcollage Forms collage from randomly select web images jwz.org/webcollage/ pet project of one of the authors of Netscape. Seems to come from differing IP nodes.
WebCompass (quarterdeck search engine software)
WebGenie webgenie.com/products.html. presumably one of the CGI-based products available on this site. Possibly the 'Site Sleuth'
Web Hound Unable to find "Or rather I found several different 'web hounds' so can't tell" "which this was "
Web Magnet webmagnet.com this appears to be a tool used by this web consultancy.
WebMiner Either tribolic.com/webminer/ or webminer.com/webminer/index.cfm?section=overview A tool to track down and target visitors to your website
WebPix Tool to fetch all pictures from a web site netwu.com/webpix/
WebSymmetrix "Originates in Korea and is possibly related to their" National Computerization Agency. Uses IP address 18.104.22.168
webrank webrank.com/features.asp Search engine popularity meter.
webwasher webwasher.com/en/products/wwash/functions.htm (browser filter)
WhosTalking softwaresolutions.net/whostalking/ Software that tracks Trademark usage last time I saw it it was creating 404 errors by adding &dg.. to each URL. Hopefully they'll fix this
MacroX.de macrox.de (German). Appears to be an interpreter designed to help automate regular tasks on a Windows PC.
XupiterToolbar A toolbar that sets up xupiter.com as the default search engine. There appears to be a lot of negative press regarding this toolbar
yacy yacy.net/home.html. An open source and distributed search engine project. The above URL seems to redirect to an IP-based one
YottaShopping_Bot www-yottashopping-com/. User arent clains this is a "Shopping Search Engine but the URL requires a login" so I was unable to verify (so I deliberately made it's URL non-clickable). Crawled from 22.214.171.124
Sites that regularly visit
"Some IP addresses or sites may regularly visit you although the user agent may be obscure blank or even change. "
Here are a few that I've been able to work out
Site address(es) Description
proxy.netsetter.org This is a site thet offers a speed-up "to your surfing in return for being able to" monitoring people's surfing habits. The speed-ups "are acheived through a variety of techniques " "and the monitoring info is sold on although your" privacy is protected. Visit netsetter.org for more details.
pwoshoes.transport.com Not known
...lightrealm.com This site daily reads any xml files submitted to a shareware site in PAD format. PAD is a means for describing shareware devised by the Association of Shareware Professionals (asp-shareware.org). This site "is performing daily checks looking to automatically" update its lists with any changes.
Other useful sites
Here are links to other sites you might find useful when looking into web robots
botspot.com "A Bot monitor site with regular updates and links to" the bot's home pages.
htmlhelp.com/links/validators.htm A list of HTML validators
iplists.com A site that lists IP addresses of search engine bots and others. More comprehensive (and probably more up to date) that the IP addresses shown on this page (which tends to record the first IP address seen)
tool.motoricerca.info/robots-checker.phtml An online syntax checker for robots.txt files. Enter the URL of your robots.txt file to get it checked and to see a summary of what effect it will have.
mozilla.org/build/revised-user-agent-strings.html Mozilla web browser project. This page describes the conventions used for formatting the User Agent in the form 'Mozilla...'
robotstxt.org/wc/robots.html A site dedicated to the robots.txt file. This page "gives some background to how robots work although" there list of robots is quite small.
searchtools.com/robots/ A page collecting together a number of resources to do with all aspects of web robots.
spiderhunter.com A site primarily about 'cloaking' sites the art of making a site look different to different visitors. Contains articles on how to detect spiders.
webcab.de/wapua.htm A site listing WAP user agent strings. These will mostly be mobile phones
webmasterworld.com/forum11/index.htm This site contains a number of forums for topics of interest to webmasters everywhere. This particular forum actively discusses robots and search engines that visit your site.
"...And finally some fakers "
Increasingly security and privacy concerns mean that users and companies are wary about giving away information to sites they visit through the user agent and other fields that appear in server logs.
"Some browsers will allow you to select the user agent you present when visiting a site. The Opera browser does this for example to allow it's users to pretend to be either IE or Netscape when visiting web sites coded in a way that forgets there are other browsers in use. "
"Also as firewalls become more common we will see more and more user agent fields beling blocked by the firewall that will prevent this information being transmitted to the outside world. "
"Just to prove that you can never rely on the user agent here is a selection of user agent strings I've seen in my log files that tell us nothing about the software being used (although some of them speak volumes about the person driving the software). I'm omitting any IP addresses I may have to protect the identities of those concerned "
user agent' seen Comments
Bruciebot I'm assured this was created by a regular in alt.webmasters
Blocked by Norton The agent has been blocked
Geblokkeerd door Norton by Norton Utilities. The refferrer
Blockeriet von Norton is also withheld. The second version is Dutch. No doubt other languages occur
Don't Like AOL Oh dear. This could start a trend!
Don't be so nosey "Hey! you came to my site first remember? "
Don't you wish you knew. Obviously.
Go Away A bit rich from someone who came to my site!
Field blocked by AtGuard Surfer is behind the AtGuard firewall (now part of Norton Internet Security 2000) which prevents the true User Agent being transmitted. home.pages.at/atguard/
Field blocked by Outpost agnitum.com Again field is witheld by the software
Isch habe gar kein Browser German for 'I have no browser' "Or so I thought until I received the following" from Clemens Marschner Actually it is German with Italian accent! The word refers to an advertisement of the Nescafe "coffee where a smart Italian convinces a beautiful" lady to stay and drink coffee with her after she knocks at his door to complain that his car is in the way of hers. And after she stayed and listened to him while he prepares the coffee with lots of gestures "and Italian speak she again asks him to move his car " "and he goes 'Isch 'abe gar keine Auto Signorina' (I" "don't even have a car signorina). Since that" "commercial was shown for years presumably all German" web masters know it...
My Web browser is not of your business "True but no fun."
multiBlocker browser multiblocker.com/home.html Although this seems to mainly offer protection against visitor "to your site they obviously also provide a" user agent blocker for people browsing
Wabbit's don't use browsers Probably the proxy service at rabbit-proxy.sourceforge.net/
"Wot no browser? (Win67; X; SK) " Win67 ?!? Ah... a dream come true!
Who gives a ? It's as least as good as Lynx "Ah yes but how do we know that?"
Who wants to know? I do.
Awards for this page
I've been told this page is referenced in the book Spidering Hacks
All awards gratefully received
"This page is © 2000-2005 John A Fotheringham. It may not be reproduced without permission "
although you are welcome to save a copy for personal use to your hard disk.