

|
Search Engine Robots Indexrobot-name: Fouineur robot-cover-url: http://fouineur.9bit.qc.ca/ robot-details-url: http://fouineur.9bit.qc.ca/informations.html robot-owner-name: Joel Vandal robot-owner-url: http://www.9bit.qc.ca/~jvandal/ robot-owner-email: jvandal@9bit.qc.ca robot-status: development robot-purpose: indexing, statistics robot-type: standalone robot-platform: unix, windows robot-availability: none robot-exclusion: yes robot-exclusion-useragent: fouineur robot-noindex: no robot-host: * robot-from: yes robot-useragent: Mozilla/2.0 (compatible fouineur v2.0; fouineur.9bit.qc.ca) robot-language: perl5 robot-description: This robot build automaticaly a database that is used by our own search engine. This robot auto-detect the language (french, english & spanish) used in the HTML page. Each database record generated by this robot include: date, url, title, total words, title, size and de-htmlized text. Also support server-side and client-side IMAGEMAP. robot-history: No robots does all thing that we need for our usage. robot-environment: service modified-date: Thu, 9 Jan 1997 22:57:28 EST modified-by: jvandal@9bit.qc.ca robot-name: Robot Francoroute robot-cover-url: robot-details-url: robot-owner-name: Marc-Antoine Parent robot-owner-url: http://www.crim.ca/~maparent robot-owner-email: maparent@crim.ca robot-status: robot-purpose: indexing, mirroring, statistics robot-type: browser robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: zorro.crim.ca robot-from: yes robot-useragent: Robot du CRIM 1.0a robot-language: perl5, sqlplus robot-description: Part of the RISQ's Francoroute project for researching francophone. Uses the Accept-Language tag and reduces demand accordingly robot-history: robot-environment: modified-date: Wed Jan 10 23:56:22 1996. modified-by: robot-name: Freecrawl robot-cover-url: http://euroseek.net/ robot-owner-name: Jesper Ekhall robot-owner-email: ekhall@freeside.net robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Freecrawl robot-noindex: no robot-host: *.freeside.net robot-from: yes robot-useragent: Freecrawl robot-language: c robot-description: The Freecrawl robot is used to build a database for the EuroSeek service. robot-environment: service robot-name: FunnelWeb robot-cover-url: http://funnelweb.net.au robot-details-url: robot-owner-name: David Eagles robot-owner-url: http://www.pc.com.au robot-owner-email: eaglesd@pc.com.au robot-status: robot-purpose: indexing, statisitics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: earth.planets.com.au robot-from: yes robot-useragent: FunnelWeb-1.0 robot-language: c and c++ robot-description: Its purpose is to generate a Resource Discovery database, and generate statistics. Localised South Pacific Discovery and Search Engine, plus distributed operation under development. robot-history: robot-environment: modified-date: Mon Nov 27 21:30:11 1995 modified-by: robot-name: gammaSpider, FocusedCrawler robot-details-url: http://www.gammasite.com, http://www.gammasite.com/gammaSpider.html robot-cover-url: http://www.gammasite.com robot-owner-name: gammasite robot-owner-url: http://www.gammasite.com robot-owner-email: support@gammasite.com robot-status: active robot-purpose: indexing, maintenance robot-type: standalone robot-platform: unix, windows, windows95, windowsNT, linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: gammaSpider robot-noindex: no robot-nofollow: no robot-host: * robot-from: no robot-useragent: gammaSpider xxxxxxx ()/ robot-language: c++ robot-description: Information gathering. Focused carwling on specific topic. Uses gammaFetcherServer Product for selling. RobotUserAgent may changed by the user. More features are being added. The product is constatnly under development. AKA FocusedCrawler robot-history: AKA FocusedCrawler robot-environment: service, commercial, research modified-date: Sun, 25 Mar 2001 18:49:52 GMT robot-name: gazz robot-cover-url: http://gazz.nttrd.com/ robot-details-url: http://gazz.nttrd.com/ robot-owner-name: NTT Cyberspace Laboratories robot-owner-url: http://gazz.nttrd.com/ robot-owner-email: gazz@nttrd.com robot-status: development robot-purpose: statistics robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: gazz robot-noindex: yes robot-host: *.nttrd.com, *.infobee.ne.jp robot-from: yes robot-useragent: gazz/1.0 robot-language: c robot-description: This robot is used for research purposes. robot-history: Its root is TITAN project in NTT. robot-environment: research modified-date: Wed, 09 Jun 1999 10:43:18 GMT modified-by: noto@isl.ntt.co.jp robot-name: GCreep robot-cover-url: http://www.instrumentpolen.se/gcreep/index.html robot-details-url: http://www.instrumentpolen.se/gcreep/index.html robot-owner-name: Instrumentpolen AB robot-owner-url: http://www.instrumentpolen.se/ip-kontor/eng/index.html robot-owner-email: anders@instrumentpolen.se robot-status: development robot-purpose: indexing robot-type: browser+standalone robot-platform: linux+mysql robot-availability: none robot-exclusion: yes robot-exclusion-useragent: gcreep robot-noindex: yes robot-host: mbx.instrumentpolen.se robot-from: yes robot-useragent: gcreep/1.0 robot-language: c robot-description: Indexing robot to learn SQL robot-history: Spare time project begun late '96, maybe early '97 robot-environment: hobby modified-date: Fri, 23 Jan 1998 16:09:00 MET modified-by: Anders Hedstrom robot-name: GetBot robot-cover-url: http://www.blacktop.com.zav/bots robot-details-url: robot-owner-name: Alex Zavatone robot-owner-url: http://www.blacktop.com/zav robot-owner-email: zav@macromedia.com robot-status: robot-purpose: maintenance robot-type: standalone robot-platform: robot-availability: robot-exclusion: no. robot-exclusion-useragent: robot-noindex: robot-host: robot-from: no robot-useragent: ??? robot-language: Shockwave/Director. robot-description: GetBot's purpose is to index all the sites it can find that contain Shockwave movies. It is the first bot or spider written in Shockwave. The bot was originally written at Macromedia on a hungover Sunday as a proof of concept. - Alex Zavatone 3/29/96 robot-history: robot-environment: modified-date: Fri Mar 29 20:06:12 1996. modified-by: robot-name: GetURL robot-cover-url: http://Snark.apana.org.au/James/GetURL/ robot-details-url: robot-owner-name: James Burton robot-owner-url: http://Snark.apana.org.au/James/ robot-owner-email: James@Snark.apana.org.au robot-status: robot-purpose: maintenance, mirroring robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: no robot-useragent: GetURL.rexx v1.05 robot-language: ARexx (Amiga REXX) robot-description: Its purpose is to validate links, perform mirroring, and copy document trees. Designed as a tool for retrieving web pages in batch mode without the encumbrance of a browser. Can be used to describe a set of pages to fetch, and to maintain an archive or mirror. Is not run by a central site and accessed by clients - is run by the end user or archive maintainer robot-history: robot-environment: modified-date: Tue May 9 15:13:12 1995 modified-by: robot-name: Golem robot-cover-url: http://www.quibble.com/golem/ robot-details-url: http://www.quibble.com/golem/ robot-owner-name: Geoff Duncan robot-owner-url: http://www.quibble.com/geoff/ robot-owner-email: geoff@quibble.com robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: mac robot-availability: none robot-exclusion: yes robot-exclusion-useragent: golem robot-noindex: no robot-host: *.quibble.com robot-from: yes robot-useragent: Golem/1.1 robot-language: HyperTalk/AppleScript/C++ robot-description: Golem generates status reports on collections of URLs supplied by clients. Designed to assist with editorial updates of Web-related sites or products. robot-history: Personal project turned into a contract service for private clients. robot-environment: service,research modified-date: Wed, 16 Apr 1997 20:50:00 GMT modified-by: Geoff Duncan robot-name: Googlebot robot-cover-url: http://www.googlebot.com/ robot-details-url: http://www.googlebot.com/bot.html robot-owner-name: Google Inc. robot-owner-url: http://www.google.com/ robot-owner-email: googlebot@google.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: googlebot robot-noindex: yes robot-host: googlebot.com robot-from: yes robot-useragent: Googlebot/2.X (+http://www.googlebot.com/bot.html) robot-language: c++ robot-description: Google's crawler robot-history: Developed by Google Inc robot-environment: commercial modified-date: Thu Mar 29 21:00:07 PST 2001 modified-by: googlebot@google.com robot-name: Grapnel/0.01 Experiment robot-cover-url: varies robot-details-url: mailto:v93_kat@ce.kth.se robot-owner-name: Philip Kallerman robot-owner-url: v93_kat@ce.kth.se robot-owner-email: v93_kat@ce.kth.se robot-status: Experimental robot-purpose: Indexing robot-type: robot-platform: WinNT robot-availability: None, yet robot-exclusion: Yes robot-exclusion-useragent: No robot-noindex: No robot-host: varies robot-from: Varies robot-useragent: robot-language: Perl robot-description: Resource Discovery Experimentation robot-history: None, hoping to make some robot-environment: modified-date: modified-by: 7 Feb 1997 robot-name:Griffon robot-cover-url:http://navi.ocn.ne.jp/ robot-details-url:http://navi.ocn.ne.jp/griffon/ robot-owner-name:NTT Communications Corporate Users Business Division robot-owner-url:http://navi.ocn.ne.jp/ robot-owner-email:griffon@super.navi.ocn.ne.jp robot-status:active robot-purpose:indexing robot-type:standalone robot-platform:unix robot-availability:none robot-exclusion:yes robot-exclusion-useragent:griffon robot-noindex:yes robot-nofollow:yes robot-host:*.navi.ocn.ne.jp robot-from:yes robot-useragent:griffon/1.0 robot-language:c robot-description:The Griffon robot is used to build database for the OCN navi search service operated by NTT Communications Corporation. It mainly gathers pages written in Japanese. robot-history:Its root is TITAN project in NTT. robot-environment:service modified-date:Mon,25 Jan 2000 15:25:30 GMT modified-by:toka@navi.ocn.ne.jp robot-name: Gromit robot-cover-url: http://www.austlii.edu.au/ robot-details-url: http://www2.austlii.edu.au/~dan/gromit/ robot-owner-name: Daniel Austin robot-owner-url: http://www2.austlii.edu.au/~dan/ robot-owner-email: dan@austlii.edu.au robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Gromit robot-noindex: no robot-host: *.austlii.edu.au robot-from: yes robot-useragent: Gromit/1.0 robot-language: perl5 robot-description: Gromit is a Targetted Web Spider that indexes legal sites contained in the AustLII legal links database. robot-history: This robot is based on the Perl5 LWP::RobotUA module. robot-environment: research modified-date: Wed, 11 Jun 1997 03:58:40 GMT modified-by: Daniel Austin robot-name: Northern Light Gulliver robot-cover-url: robot-details-url: robot-owner-name: Mike Mulligan robot-owner-url: robot-owner-email: crawler@northernlight.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: gulliver robot-noindex: yes robot-host: scooby.northernlight.com, taz.northernlight.com, gulliver.northernlight.com robot-from: yes robot-useragent: Gulliver/1.1 robot-language: c robot-description: Gulliver is a robot to be used to collect web pages for indexing and subsequent searching of the index. robot-history: Oct 1996: development; Dec 1996-Jan 1997: crawl & debug; Mar 1997: crawl again; robot-environment: service modified-date: Wed, 21 Apr 1999 16:00:00 GMT modified-by: Mike Mulligan robot-name: Gulper Bot robot-cover-url: http://yuntis.ecsl.cs.sunysb.edu/ robot-details-url: http://yuntis.ecsl.cs.sunysb.edu/help/robot/ robot-owner-name: Maxim Lifantsev robot-owner-url: http://www.cs.sunysb.edu/~maxim/ robot-owner-email: gulperbot@ecsl.cs.sunysb.edu robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: gulper robot-noindex: yes robot-nofollow: yes robot-host: yuntis*.ecsl.cs.sunysb.edu robot-from: no robot-useragent: Gulper Web Bot 0.2.4 (www.ecsl.cs.sunysb.edu/~maxim/cgi-bin/Link/GulperBot) robot-language: c++ robot-description: The Gulper Bot is used to collect data for the Yuntis research search engine project. robot-history: Developed in a research project at SUNY Stony Brook. robot-environment: research modified-date: Tue, 28 Aug 2001 21:40:47 GMT modified-by: maxim@cs.sunysb.edu robot-name: HamBot robot-cover-url: http://www.hamrad.com/search.html robot-details-url: http://www.hamrad.com/ robot-owner-name: John Dykstra robot-owner-url: robot-owner-email: john@futureone.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix, Windows95 robot-availability: none robot-exclusion: yes robot-exclusion-useragent: hambot robot-noindex: yes robot-host: *.hamrad.com robot-from: robot-useragent: robot-language: perl5, C++ robot-description: Two HamBot robots are used (stand alone & browser based) to aid in building the database for HamRad Search - The Search Engine for Search Engines. The robota are run intermittently and perform nearly identical functions. robot-history: A non commercial (hobby?) project to aid in building and maintaining the database for the the HamRad search engine. robot-environment: service modified-date: Fri, 17 Apr 1998 21:44:00 GMT modified-by: JD robot-name: Harvest robot-cover-url: http://harvest.cs.colorado.edu robot-details-url: robot-owner-name: robot-owner-url: robot-owner-email: robot-status: robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: bruno.cs.colorado.edu robot-from: yes robot-useragent: yes robot-language: robot-description: Harvest's motivation is to index community- or topic- specific collections, rather than to locate and index all HTML objects that can be found. Also, Harvest allows users to control the enumeration several ways, including stop lists and depth and count limits. Therefore, Harvest provides a much more controlled way of indexing the Web than is typical of robots. Pauses 1 second between requests (by default). robot-history: robot-environment: modified-date: modified-by: robot-name: havIndex robot-cover-url: http://www.hav.com/ robot-details-url: http://www.hav.com/ robot-owner-name: hav.Software and Horace A. (Kicker) Vallas robot-owner-url: http://www.hav.com/ robot-owner-email: havIndex@hav.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Java VM 1.1 robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: havIndex robot-noindex: yes robot-host: * robot-from: no robot-useragent: havIndex/X.xx[bxx] robot-language: Java robot-description: havIndex allows individuals to build searchable word index of (user specified) lists of URLs. havIndex does not crawl - rather it requires one or more user supplied lists of URLs to be indexed. havIndex does (optionally) save urls parsed from indexed pages. robot-history: Developed to answer client requests for URL specific index capabilities. robot-environment: commercial, service modified-date: 6-27-98 modified-by: Horace A. (Kicker) Vallas robot-name: HI (HTML Index) Search robot-cover-url: http://cs6.cs.ait.ac.th:21870/pa.html robot-details-url: robot-owner-name: Razzakul Haider Chowdhury robot-owner-url: http://cs6.cs.ait.ac.th:21870/index.html robot-owner-email: a94385@cs.ait.ac.th robot-status: robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: yes robot-useragent: AITCSRobot/1.1 robot-language: perl 5 robot-description: Its purpose is to generate a Resource Discovery database. This Robot traverses the net and creates a searchable database of Web pages. It stores the title string of the HTML document and the absolute url. A search engine provides the boolean AND & OR query models with or without filtering the stop list of words. Feature is kept for the Web page owners to add the url to the searchable database. robot-history: robot-environment: modified-date: Wed Oct 4 06:54:31 1995 modified-by: robot-name: Hometown Spider Pro robot-cover-url: http://www.hometownsingles.com robot-details-url: http://www.hometownsingles.com robot-owner-name: Bob Brown robot-owner-url: http://www.hometownsingles.com robot-owner-email: admin@hometownsingles.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: * robot-noindex: yes robot-host: 63.195.193.17 robot-from: no robot-useragent: Hometown Spider Pro robot-language: delphi robot-description: The Hometown Spider Pro is used to maintain the indexes for Hometown Singles. robot-history: Innerprise URL Spider Pro robot-environment: commercial modified-date: Tue, 28 Mar 2000 16:00:00 GMT modified-by: Hometown Singles robot-name: Wired Digital robot-cover-url: robot-details-url: robot-owner-name: Bowen Dwelle robot-owner-url: robot-owner-email: bowen@hotwired.com robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: hotwired robot-noindex: no robot-host: gossip.hotwired.com robot-from: yes robot-useragent: wired-digital-newsbot/1.5 robot-language: perl-5.004 robot-description: this is a test robot-history: robot-environment: research modified-date: Thu, 30 Oct 1997 modified-by: bowen@hotwired.com robot-name: ht://Dig robot-cover-url: http://www.htdig.org/ robot-details-url: http://www.htdig.org/howitworks.html robot-owner-name: Andrew Scherpbier robot-owner-url: http://www.htdig.org/author.html robot-owner-email: andrew@contigo.com robot-owner-name2: Geoff Hutchison robot-owner-url2: http://wso.williams.edu/~ghutchis/ robot-owner-email2: ghutchis@wso.williams.edu robot-status: robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: htdig robot-noindex: yes robot-host: * robot-from: no robot-useragent: htdig/3.1.0b2 robot-language: C,C++. robot-history:This robot was originally developed for use at San Diego State University. robot-environment: modified-date:Tue, 3 Nov 1998 10:09:02 EST modified-by: Geoff Hutchison <Geoffrey.R.Hutchison@williams.edu> robot-name: HTMLgobble robot-cover-url: robot-details-url: robot-owner-name: Andreas Ley robot-owner-url: robot-owner-email: ley@rz.uni-karlsruhe.de robot-status: robot-purpose: mirror robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: tp70.rz.uni-karlsruhe.de robot-from: yes robot-useragent: HTMLgobble v2.2 robot-language: robot-description: A mirroring robot. Configured to stay within a directory, sleeps between requests, and the next version will use HEAD to check if the entire document needs to be retrieved robot-history: robot-environment: modified-date: modified-by: robot-name: Hyper-Decontextualizer robot-cover-url: http://www.tricon.net/Comm/synapse/spider/ robot-details-url: robot-owner-name: Cliff Hall robot-owner-url: http://kpt1.tricon.net/cgi-bin/cliff.cgi robot-owner-email: cliff@tricon.net robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: robot-from: no robot-useragent: no robot-language: Perl 5 Takes an input sentence and marks up each word with an appropriate hyper-text link. robot-description: robot-history: robot-environment: modified-date: Mon May 6 17:41:29 1996. modified-by:
|
| |||||||||||||||||||||||||||||