

|
Submission Robots Indexrobot-name: NetCarta WebMap Engine robot-cover-url: http://www.netcarta.com/ robot-details-url: robot-owner-name: NetCarta WebMap Engine robot-owner-url: http://www.netcarta.com/ robot-owner-email: info@netcarta.com robot-status: robot-purpose: indexing, maintenance, mirroring, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: NetCarta CyberPilot Pro robot-language: C++. robot-description: The NetCarta WebMap Engine is a general purpose, commercial spider. Packaged with a full GUI in the CyberPilo Pro product, it acts as a personal spider to work with a browser to facilitiate context-based navigation. The WebMapper product uses the robot to manage a site (site copy, site diff, and extensive link management facilities). All versions can create publishable NetCarta WebMaps, which capture the crawled information. If the robot sees a published map, it will return the published map rather than continuing its crawl. Since this is a personal spider, it will be launched from multiple domains. This robot tends to focus on a particular site. No instance of the robot should have more than one outstanding request out to any given site at a time. The User-agent field contains a coded ID identifying the instance of the spider; specific users can be blocked via robots.txt using this ID. robot-history: robot-environment: modified-date: Sun Feb 18 02:02:49 1996. modified-by: robot-name: NetMechanic robot-cover-url: http://www.netmechanic.com robot-details-url: http://www.netmechanic.com/faq.html robot-owner-name: Tom Dahm robot-owner-url: http://iquest.com/~tdahm robot-owner-email: tdahm@iquest.com robot-status: development robot-purpose: Link and HTML validation robot-type: standalone with web gateway robot-platform: UNIX robot-availability: via web page robot-exclusion: Yes robot-exclusion-useragent: WebMechanic robot-noindex: no robot-host: 206.26.168.18 robot-from: no robot-useragent: NetMechanic robot-language: C robot-description: NetMechanic is a link validation and HTML validation robot run using a web page interface. robot-history: robot-environment: modified-date: Sat, 17 Aug 1996 12:00:00 GMT modified-by: robot-name: NetScoop robot-cover-url: http://www-a2k.is.tokushima-u.ac.jp/search/index.html robot-owner-name: Kenji Kita robot-owner-url: http://www-a2k.is.tokushima-u.ac.jp/member/kita/index.html robot-owner-email: kita@is.tokushima-u.ac.jp robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: UNIX robot-availability: none robot-exclusion: yes robot-exclusion-useragent: NetScoop robot-host: alpha.is.tokushima-u.ac.jp, beta.is.tokushima-u.ac.jp robot-useragent: NetScoop/1.0 libwww/5.0a robot-language: C robot-description: The NetScoop robot is used to build the database for the NetScoop search engine. robot-history: The robot has been used in the research project at the Faculty of Engineering, Tokushima University, Japan., since Dec. 1996. robot-environment: research modified-date: Fri, 10 Jan 1997. modified-by: Kenji Kita robot-name: newscan-online robot-cover-url: http://www.newscan-online.de/ robot-details-url: http://www.newscan-online.de/info.html robot-owner-name: Axel Mueller robot-owner-url: robot-owner-email: mueller@newscan-online.de robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Linux robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: newscan-online robot-noindex: no robot-host: *newscan-online.de robot-from: yes robot-useragent: newscan-online/1.1 robot-language: perl robot-description: The newscan-online robot is used to build a database for the newscan-online news search service operated by smart information services. The robot runs daily and visits predefined sites in a random order. robot-history: This robot finds its roots in a prereleased software for news filtering for Lotus Notes in 1995. robot-environment: service modified-date: Fri, 9 Apr 1999 11:45:00 GMT modified-by: Axel Mueller robot-name: NHSE Web Forager robot-cover-url: http://nhse.mcs.anl.gov/ robot-details-url: robot-owner-name: Robert Olson robot-owner-url: http://www.mcs.anl.gov/people/olson/ robot-owner-email: olson@mcs.anl.gov robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: *.mcs.anl.gov robot-from: yes robot-useragent: NHSEWalker/3.0 robot-language: perl 5 robot-description: to generate a Resource Discovery database robot-history: robot-environment: modified-date: Fri May 5 15:47:55 1995 modified-by: robot-name: Nomad robot-cover-url: http://www.cs.colostate.edu/~sonnen/projects/nomad.html robot-details-url: robot-owner-name: Richard Sonnen robot-owner-url: http://www.cs.colostate.edu/~sonnen/ robot-owner-email: sonnen@cs.colostat.edu robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: *.cs.colostate.edu robot-from: no robot-useragent: Nomad-V2.x robot-language: Perl 4 robot-description: robot-history: Developed in 1995 at Colorado State University. robot-environment: modified-date: Sat Jan 27 21:02:20 1996. modified-by: robot-name: The NorthStar Robot robot-cover-url: http://comics.scs.unr.edu:7000/top.html robot-details-url: robot-owner-name: Fred Barrie robot-owner-url: robot-owner-email: barrie@unr.edu robot-status: robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: frognot.utdallas.edu, utdallas.edu, cnidir.org robot-from: yes robot-useragent: NorthStar robot-language: robot-description: Recent runs (26 April 94) will concentrate on textual analysis of the Web versus GopherSpace (from the Veronica data) as well as indexing. robot-history: robot-environment: modified-date: modified-by: robot-name: Occam robot-cover-url: http://www.cs.washington.edu/research/projects/ai/www/occam/ robot-details-url: robot-owner-name: Marc Friedman robot-owner-url: http://www.cs.washington.edu/homes/friedman/ robot-owner-email: friedman@cs.washington.edu robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Occam robot-noindex: no robot-host: gentian.cs.washington.edu, sekiu.cs.washington.edu, saxifrage.cs.washington.edu robot-from: yes robot-useragent: Occam/1.0 robot-language: CommonLisp, perl4 robot-description: The robot takes high-level queries, breaks them down into multiple web requests, and answers them by combining disparate data gathered in one minute from numerous web sites, or from the robots cache. Currently the only user is me. robot-history: The robot is a descendant of Rodney, an earlier project at the University of Washington. robot-environment: research modified-date: Thu, 21 Nov 1996 20:30 GMT modified-by: friedman@cs.washington.edu (Marc Friedman) robot-name: HKU WWW Octopus robot-cover-url: http://phoenix.cs.hku.hk:1234/~jax/w3rui.shtml robot-details-url: robot-owner-name: Law Kwok Tung , Lee Tak Yeung , Lo Chun Wing robot-owner-url: http://phoenix.cs.hku.hk:1234/~jax robot-owner-email: jax@cs.hku.hk robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: no. robot-exclusion-useragent: robot-noindex: robot-host: phoenix.cs.hku.hk robot-from: yes robot-useragent: HKU WWW Robot, robot-language: Perl 5, C, Java. robot-description: HKU Octopus is an ongoing project for resource discovery in the Hong Kong and China WWW domain . It is a research project conducted by three undergraduate at the University of Hong Kong robot-history: robot-environment: modified-date: Thu Mar 7 14:21:55 1996. modified-by: robot-name: Openfind data gatherer robot-cover-url: http://www.openfind.com.tw/ robot-details-url: http://www.openfind.com.tw/robot.html robot-owner-name: robot-owner-url: robot-owner-email: robot-response@openfind.com.tw robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: 66.7.131.132 robot-from: robot-useragent: Openfind data gatherer, Openbot/3.0+(robot-response@openfind.com.tw;+http://www.openfind.com.tw/robot.html) robot-language: robot-description: robot-history: robot-environment: modified-date: Thu, 26 Apr 2001 02:55:21 GMT modified-by: stanislav shalunov <shalunov@internet2.edu> robot-name: Orb Search robot-cover-url: http://orbsearch.home.ml.org robot-details-url: http://orbsearch.home.ml.org robot-owner-name: Matt Weber robot-owner-url: http://www.weberworld.com robot-owner-email: webernet@geocities.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: yes robot-exclusion-useragent: Orbsearch/1.0 robot-noindex: yes robot-host: cow.dyn.ml.org, *.dyn.ml.org robot-from: yes robot-useragent: Orbsearch/1.0 robot-language: Perl5 robot-description: Orbsearch builds the database for Orb Search Engine. It runs when requested. robot-history: This robot was started as a hobby. robot-environment: hobby modified-date: Sun, 31 Aug 1997 02:28:52 GMT modified-by: Matt Weber robot-name: Pack Rat robot-cover-url: http://web.cps.msu.edu/~dexterte/isl/packrat.html robot-details-url: robot-owner-name: Terry Dexter robot-owner-url: http://web.cps.msu.edu/~dexterte robot-owner-email: dexterte@cps.msu.edu robot-status: development robot-purpose: both maintenance and mirroring robot-type: standalone robot-platform: unix robot-availability: at the moment, none...source when developed. robot-exclusion: yes robot-exclusion-useragent: packrat or * robot-noindex: no, not yet robot-host: cps.msu.edu robot-from: robot-useragent: PackRat/1.0 robot-language: perl with libwww-5.0 robot-description: Used for local maintenance and for gathering web pages so that local statisistical info can be used in artificial intelligence programs. Funded by NEMOnline. robot-history: In the making... robot-environment: research modified-date: Tue, 20 Aug 1996 15:45:11 modified-by: Terry Dexter robot-name:PageBoy robot-cover-url:http://www.webdocs.org/ robot-details-url:http://www.webdocs.org/ robot-owner-name:Chihiro Kuroda robot-owner-url:http://www.webdocs.org/ robot-owner-email:pageboy@webdocs.org robot-status:development robot-purpose:indexing robot-type:standalone robot-platform:unix robot-availability:none robot-exclusion:yes robot-exclusion-useragent:pageboy robot-noindex:yes robot-nofollow:yes robot-host:*.webdocs.org robot-from:yes robot-useragent:PageBoy/1.0 robot-language:c robot-description:The robot visits at regular intervals. robot-history:none robot-environment:service modified-date:Fri, 21 Oct 1999 17:28:52 GMT modified-by:webdocs robot-name: ParaSite robot-cover-url: http://www.ianett.com/parasite/ robot-details-url: http://www.ianett.com/parasite/ robot-owner-name: iaNett.com robot-owner-url: http://www.ianett.com/ robot-owner-email: parasite@ianett.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: ParaSite robot-noindex: yes robot-nofollow: yes robot-host: *.ianett.com robot-from: yes robot-useragent: ParaSite/0.21 (http://www.ianett.com/parasite/) robot-language: c++ robot-description: Builds index for ianett.com search database. Runs continiously. robot-history: Second generation of ianett.com spidering technology, originally called Sven. robot-environment: service modified-date: July 28, 2000 modified-by: Marty Anstey robot-name: Patric robot-cover-url: http://www.nwnet.net/technical/ITR/index.html robot-details-url: http://www.nwnet.net/technical/ITR/index.html robot-owner-name: toney@nwnet.net robot-owner-url: http://www.nwnet.net/company/staff/toney robot-owner-email: webmaster@nwnet.net robot-status: development robot-purpose: statistics robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: yes robot-exclusion-useragent: patric robot-noindex: yes robot-host: *.nwnet.net robot-from: no robot-useragent: Patric/0.01a robot-language: perl robot-description: (contained at http://www.nwnet.net/technical/ITR/index.html ) robot-history: (contained at http://www.nwnet.net/technical/ITR/index.html ) robot-environment: service modified-date: Thurs, 15 Aug 1996 modified-by: toney@nwnet.net robot-name: pegasus robot-cover-url: http://opensource.or.id/projects.html robot-details-url: http://pegasus.opensource.or.id robot-owner-name: A.Y.Kiky Shannon robot-owner-url: http://go.to/ayks robot-owner-email: shannon@opensource.or.id robot-status: inactive - open source robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source, binary robot-exclusion: yes robot-exclusion-useragent: pegasus robot-noindex: yes robot-host: * robot-from: yes robot-useragent: web robot PEGASUS robot-language: perl5 robot-description: pegasus gathers information from HTML pages (7 important tags). The indexing process can be started based on starting URL(s) or a range of IP address. robot-history: This robot was created as an implementation of a final project on Informatics Engineering Department, Institute of Technology Bandung, Indonesia. robot-environment: research modified-date: Fri, 20 Oct 2000 14:58:40 GMT modified-by: A.Y.Kiky Shannon robot-name: The Peregrinator robot-cover-url: http://www.maths.usyd.edu.au:8000/jimr/pe/Peregrinator.html robot-details-url: robot-owner-name: Jim Richardson robot-owner-url: http://www.maths.usyd.edu.au:8000/jimr.html robot-owner-email: jimr@maths.su.oz.au robot-status: robot-purpose: robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: yes robot-useragent: Peregrinator-Mathematics/0.7 robot-language: perl 4 robot-description: This robot is being used to generate an index of documents on Web sites connected with mathematics and statistics. It ignores off-site links, so does not stray from a list of servers specified initially. robot-history: commenced operation in August 1994 robot-environment: modified-date: modified-by: robot-name: PerlCrawler 1.0 robot-cover-url: http://perlsearch.hypermart.net/ robot-details-url: http://www.xav.com/scripts/xavatoria/index.html robot-owner-name: Matt McKenzie robot-owner-url: http://perlsearch.hypermart.net/ robot-owner-email: webmaster@perlsearch.hypermart.net robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: perlcrawler robot-noindex: yes robot-host: server5.hypermart.net robot-from: yes robot-useragent: PerlCrawler/1.0 Xavatoria/2.0 robot-language: perl5 robot-description: The PerlCrawler robot is designed to index and build a database of pages relating to the Perl programming language. robot-history: Originated in modified form on 25 June 1998 robot-environment: hobby modified-date: Fri, 18 Dec 1998 23:37:40 GMT modified-by: Matt McKenzie robot-name: Phantom robot-cover-url: http://www.maxum.com/phantom/ robot-details-url: robot-owner-name: Larry Burke robot-owner-url: http://www.aktiv.com/ robot-owner-email: lburke@aktiv.com robot-status: robot-purpose: indexing robot-type: standalone robot-platform: Macintosh robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: Duppies robot-language: robot-description: Designed to allow webmasters to provide a searchable index of their own site as well as to other sites, perhaps with similar content. robot-history: robot-environment: modified-date: Fri Jan 19 05:08:15 1996. modified-by: robot-name: PhpDig robot-cover-url: http://phpdig.toiletoine.net/ robot-details-url: http://phpdig.toiletoine.net/ robot-owner-name: Antoine Bajolet robot-owner-url: http://phpdig.toiletoine.net/ robot-owner-email: phpdig@toiletoine.net robot-status: * robot-purpose: indexing robot-type: standalone robot-platform: all supported by Apache/php/mysql robot-availability: source robot-exclusion: yes robot-exclusion-useragent: phpdig robot-noindex: yes robot-host: yes robot-from: no robot-useragent: phpdig/x.x.x robot-language: php 4.x robot-description: Small robot and search engine written in php. robot-history: writen first 2001-03-30 robot-environment: hobby modified-date: Sun, 21 Nov 2001 20:01:19 GMT modified-by: Antoine Bajolet robot-name: PiltdownMan robot-cover-url: http://profitnet.bizland.com/ robot-details-url: http://profitnet.bizland.com/piltdownman.html robot-owner-name: Daniel Vilà robot-owner-url: http://profitnet.bizland.com/aboutus.html robot-owner-email: profitnet@myezmail.com robot-status: active robot-purpose: statistics robot-type: standalone robot-platform: windows95, windows98, windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: piltdownman robot-noindex: no robot-nofollow: no robot-host: 62.36.128.*, 194.133.59.*, 212.106.215.* robot-from: no robot-useragent: PiltdownMan/1.0 profitnet@myezmail.com robot-language: c++ robot-description: The PiltdownMan robot is used to get a list of links from the search engines in our database. These links are followed, and the page that they refer is downloaded to get some statistics from them. The robot runs once a month, more or less, and visits the first 10 pages listed in every search engine, for a group of keywords. robot-history: To maintain a database of search engines, we needed an automated tool. That's why we began the creation of this robot. robot-environment: service modified-date: Mon, 13 Dec 1999 21:50:32 GMT modified-by: Daniel Vilà robot-name: Pimptrain.com's robot robot-cover-url: http://www.pimptrain.com/search.cgi robot-details-url: http://www.pimptrain.com/search.cgi robot-owner-name: Bryan Ankielewicz robot-owner-url: http://www.pimptrain.com robot-owner-email: webmaster@pimptrain.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source;data robot-exclusion: yes robot-exclusion-useragent: Pimptrain robot-noindex: yes robot-host: pimtprain.com robot-from: * robot-useragent: Mozilla/4.0 (compatible: Pimptrain's robot) robot-language: perl5 robot-description: Crawls remote sites as part of a search engine program robot-history: Implemented in 2001 robot-environment: commercial modified-date: May 11, 2001 modified-by: Bryan Ankielewicz robot-name: Pioneer robot-cover-url: http://sequent.uncfsu.edu/~micah/pioneer.html robot-details-url: robot-owner-name: Micah A. Williams robot-owner-url: http://sequent.uncfsu.edu/~micah/ robot-owner-email: micah@sequent.uncfsu.edu robot-status: robot-purpose: indexing, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: *.uncfsu.edu or flyer.ncsc.org robot-from: yes robot-useragent: Pioneer robot-language: C. robot-description: Pioneer is part of an undergraduate research project. robot-history: robot-environment: modified-date: Mon Feb 5 02:49:32 1996. modified-by: robot-name: html_analyzer robot-cover-url: robot-details-url: robot-owner-name: James E. Pitkow robot-owner-url: robot-owner-email: pitkow@aries.colorado.edu robot-status: robot-purpose: maintainance robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: robot-useragent: robot-language: robot-description: to check validity of Web servers. I'm not sure if it has ever been run remotely. robot-history: robot-environment: modified-date: modified-by: robot-name: Portal Juice Spider robot-cover-url: http://www.portaljuice.com robot-details-url: http://www.portaljuice.com/pjspider.html robot-owner-name: Nextopia Software Corporation robot-owner-url: http://www.portaljuice.com robot-owner-email: pjspider@portaljuice.com robot-status: active robot-purpose: indexing, statistics robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: pjspider robot-noindex: yes robot-host: *.portaljuice.com, *.nextopia.com robot-from: yes robot-useragent: PortalJuice.com/4.0 robot-language: C/C++ robot-description: Indexing web documents for Portal Juice vertical portal search engine robot-history: Indexing the web since 1998 for the purposes of offering our commerical Portal Juice search engine services. robot-environment: service modified-date: Wed Jun 23 17:00:00 EST 1999 modified-by: pjspider@portaljuice.com robot-name: PGP Key Agent robot-cover-url: http://www.starnet.it/pgp robot-details-url: robot-owner-name: Massimiliano Pucciarelli robot-owner-url: http://www.starnet.it/puma robot-owner-email: puma@comm2000.it robot-status: Active robot-purpose: indexing robot-type: standalone robot-platform: UNIX, Windows NT robot-availability: none robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: salerno.starnet.it robot-from: yes robot-useragent: PGP-KA/1.2 robot-language: Perl 5 robot-description: This program search the pgp public key for the specified user. robot-history: Originated as a research project at Salerno University in 1995. robot-environment: Research modified-date: June 27 1996. modified-by: Massimiliano Pucciarelli robot-name: PlumtreeWebAccessor robot-cover-url: robot-details-url: http://www.plumtree.com/ robot-owner-name: Joseph A. Stanko robot-owner-url: robot-owner-email: josephs@plumtree.com robot-status: development robot-purpose: indexing for the Plumtree Server robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: PlumtreeWebAccessor robot-noindex: yes robot-host: robot-from: yes robot-useragent: PlumtreeWebAccessor/0.9 robot-language: c++ robot-description: The Plumtree Web Accessor is a component that customers can add to the Plumtree Server to index documents on the World Wide Web. robot-history: robot-environment: commercial modified-date: Thu, 17 Dec 1998 modified-by: Joseph A. Stanko <josephs@plumtree.com> robot-name: Poppi robot-cover-url: http://members.tripod.com/poppisearch robot-details-url: http://members.tripod.com/poppisearch robot-owner-name: Antonio Provenzano robot-owner-url: Antonio Provenzano robot-owner-email: robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix/linux robot-availability: none robot-exclusion: robot-exclusion-useragent: robot-noindex: yes robot-host:=20 robot-from: robot-useragent: Poppi/1.0 robot-language: C robot-description: Poppi is a crawler to index the web that runs weekly gathering and indexing hypertextual, multimedia and executable file formats robot-history: Created by Antonio Provenzano in the april of 2000, has been acquired from Tomi Officine Multimediali srl and it is next to release as service and commercial robot-environment: service modified-date: Mon, 22 May 2000 15:47:30 GMT modified-by: Antonio Provenzano robot-name: PortalB Spider robot-cover-url: http://www.portalb.com/ robot-details-url: robot-owner-name: PortalB Spider Bug List robot-owner-url: robot-owner-email: spider@portalb.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: PortalBSpider robot-noindex: yes robot-nofollow: yes robot-host: spider1.portalb.com, spider2.portalb.com, etc. robot-from: no robot-useragent: PortalBSpider/1.0 (spider@portalb.com) robot-language: C++ robot-description: The PortalB Spider indexes selected sites for high-quality business information. robot-history: robot-environment: service robot-name: psbot robot-cover-url: http://www.picsearch.com/ robot-details-url: http://www.picsearch.com/bot.html robot-owner-name: picsearch AB robot-owner-url: http://www.picsearch.com/ robot-owner-email: psbot@picsearch.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: psbot robot-noindex: yes robot-nofollow: yes robot-host: *.picsearch.com robot-from: yes robot-useragent: psbot/0.X (+http://www.picsearch.com/bot.html) robot-language: c, c++ robot-description: Spider for www.picsearch.com robot-history: Developed and tested in 2000/2001 robot-environment: commercial modified-date: Tue, 21 Aug 2001 10:55:38 CEST 2001 modified-by: psbot@picsearch.com robot-name: GetterroboPlus Puu robot-details-url: http://marunaka.homing.net/straight/getter/ robot-cover-url: http://marunaka.homing.net/straight/ robot-owner-name: marunaka robot-owner-url: http://marunaka.homing.net robot-owner-email: marunaka@homing.net robot-status: active: robot actively in use robot-purpose: Purpose of the robot. One or more of: - gathering: gather data of original standerd TAG for Puu contains the information of the sites registered my Search Engin. - maintenance: link validation robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes (Puu patrols only registered url in my Search Engine) robot-exclusion-useragent: Getterrobo-Plus robot-noindex: no robot-host: straight FLASH!! Getterrobo-Plus, *.homing.net robot-from: yes robot-useragent: straight FLASH!! GetterroboPlus 1.5 robot-language: perl5 robot-description: Puu robot is used to gater data from registered site in Search Engin "straight FLASH!!" for building anouncement page of state of renewal of registered site in "straight FLASH!!". Robot runs everyday. robot-history: This robot patorols based registered sites in Search Engin "straight FLASH!!" robot-environment: hobby modified-date: Fri, 26 Jun 1998 robot-name: The Python Robot robot-cover-url: http://www.python.org/ robot-details-url: robot-owner-name: Guido van Rossum robot-owner-url: http://www.python.org/~guido/ robot-owner-email: guido@python.org robot-status: retired robot-purpose: robot-type: robot-platform: robot-availability: none robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: robot-useragent: robot-language: robot-description: robot-history: robot-environment: modified-date: modified-by: robot-name: Raven Search robot-cover-url: http://ravensearch.tripod.com robot-details-url: http://ravensearch.tripod.com robot-owner-name: Raven Group robot-owner-url: http://ravensearch.tripod.com robot-owner-email: ravensearch@hotmail.com robot-status: Development: robot under development robot-purpose: Indexing: gather content for commercial query engine. robot-type: Standalone: a separate program robot-platform: Unix, Windows98, WindowsNT, Windows2000 robot-availability: None robot-exclusion: Yes robot-exclusion-useragent: Raven robot-noindex: Yes robot-nofollow: Yes robot-host: 192.168.1.* robot-from: Yes robot-useragent: Raven-v2 robot-language: Perl-5 robot-description: Raven was written for the express purpose of indexing the web. It can parallel process hundreds of URLS's at a time. It runs on a sporadic basis as testing continues. It is really several programs running concurrently. It takes four computers to run Raven Search. Scalable in sets of four. robot-history: This robot is new. First active on March 25, 2000. robot-environment: Commercial: is a commercial product. Possibly GNU later ;-) modified-date: Fri, 25 Mar 2000 17:28:52 GMT modified-by: Raven Group robot-name: RBSE Spider robot-cover-url: http://rbse.jsc.nasa.gov/eichmann/urlsearch.html robot-details-url: robot-owner-name: David Eichmann robot-owner-url: http://rbse.jsc.nasa.gov/eichmann/home.html robot-owner-email: eichmann@rbse.jsc.nasa.gov robot-status: active robot-purpose: indexing, statistics robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: rbse.jsc.nasa.gov (192.88.42.10) robot-from: robot-useragent: robot-language: C, oracle, wais robot-description: Developed and operated as part of the NASA-funded Repository Based Software Engineering Program at the Research Institute for Computing and Information Systems, University of Houston - Clear Lake. robot-history: robot-environment: modified-date: Thu May 18 04:47:02 1995 modified-by: robot-name: Resume Robot robot-cover-url: http://www.onramp.net/proquest/resume/robot/robot.html robot-details-url: robot-owner-name: James Stakelum robot-owner-url: http://www.onramp.net/proquest/resume/java/resume.html robot-owner-email: proquest@onramp.net robot-status: robot-purpose: indexing. robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: Resume Robot robot-language: C++. robot-description: robot-history: robot-environment: modified-date: Tue Mar 12 15:52:25 1996. modified-by: robot-name: RoadHouse Crawling System robot-cover-url: http://stage.perceval.be (under developpement) robot-details-url: robot-owner-name: Gregoire Welraeds, Emmanuel Bergmans robot-owner-url: http://www.perceval.be robot-owner-email: helpdesk@perceval.be robot-status: development robot-purpose1: indexing robot-purpose2: maintenance robot-purpose3: statistics robot-type: standalone robot-platform1: unix (FreeBSD & Linux) robot-availability: none robot-exclusion: no (under development) robot-exclusion-useragent: RHCS robot-noindex: no (under development) robot-host: stage.perceval.be robot-from: no robot-useragent: RHCS/1.0a robot-language: c robot-description: robot used tp build the database for the RoadHouse search service project operated by Perceval robot-history: The need of this robot find its roots in the actual RoadHouse directory not maintenained since 1997 robot-environment: service modified-date: Fri, 26 Feb 1999 12:00:00 GMT modified-by: Gregoire Welraeds robot-name: Road Runner: The ImageScape Robot robot-owner-name: LIM Group robot-owner-email: lim@cs.leidenuniv.nl robot-status: development/active robot-purpose: indexing robot-type: standalone robot-platform: UNIX robot-exclusion: yes robot-exclusion-useragent: roadrunner robot-useragent: Road Runner: ImageScape Robot (lim@cs.leidenuniv.nl) robot-language: C, perl5 robot-description: Create Image/Text index for WWW robot-history: ImageScape Project robot-environment: commercial service modified-date: Dec. 1st, 1996 robot-name: Robbie the Robot robot-cover-url: robot-details-url: robot-owner-name: Robert H. Pollack robot-owner-url: robot-owner-email: robert.h.pollack@lmco.com robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix, windows95, windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Robbie robot-noindex: no robot-host: *.lmco.com robot-from: yes robot-useragent: Robbie/0.1 robot-language: java robot-description: Used to define document collections for the DISCO system. Robbie is still under development and runs several times a day, but usually only for ten minutes or so. Sites are visited in the order in which references are found, but no host is visited more than once in any two-minute period. robot-history: The DISCO system is a resource-discovery component in the OLLA system, which is a prototype system, developed under DARPA funding, to support computer-based education and training. robot-environment: research modified-date: Wed, 5 Feb 1997 19:00:00 GMT modified-by: robot-name: ComputingSite Robi/1.0 robot-cover-url: http://www.computingsite.com/robi/ robot-details-url: http://www.computingsite.com/robi/ robot-owner-name: Tecor Communications S.L. robot-owner-url: http://www.tecor.com/ robot-owner-email: robi@computingsite.com robot-status: Active robot-purpose: indexing,maintenance robot-type: standalone robot-platform: UNIX robot-availability: robot-exclusion: yes robot-exclusion-useragent: robi robot-noindex: no robot-host: robi.computingsite.com robot-from: robot-useragent: ComputingSite Robi/1.0 (robi@computingsite.com) robot-language: python robot-description: Intelligent agent used to build the ComputingSite Search Directory. robot-history: It was born on August 1997. robot-environment: service modified-date: Wed, 13 May 1998 17:28:52 GMT modified-by: Jorge Alegre robot-name: RoboCrawl Spider robot-cover-url: http://www.canadiancontent.net/ robot-details-url: http://www.canadiancontent.net/corp/spider.html robot-owner-name: Canadian Content Interactive Media robot-owner-url: http://www.canadiancontent.net/ robot-owner-email: staff@canadiancontent.net robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: RoboCrawl robot-noindex: yes robot-host: ncc.canadiancontent.net, ncc.air-net.no, canadiancontent.net, spider.canadiancontent.net robot-from: no robot-useragent: RoboCrawl (http://www.canadiancontent.net) robot-language: C and C++ robot-description: The Canadian Content robot indexes for it's search database. robot-history: Our robot is a newer project at Canadian Content. robot-environment: service modified-date: July 30th, 2001 modified-by: Christopher Walsh and Adam Rutter robot-name: RoboFox robot-cover-url: robot-details-url: robot-owner-name: Ian Hicks robot-owner-url: robot-owner-email: robo_fox@hotmail.com robot-status: development robot-purpose: site download robot-type: standalone robot-platform: windows9x, windowsme, windowsNT4, windows2000 robot-availability: none robot-exclusion: no robot-exclusion-useragent: robofox robot-noindex: no robot-host: * robot-from: no robot-useragent: Robofox v2.0 robot-language: Visual FoxPro robot-description: scheduled utility to download and database a domain robot-history: robot-environment: service modified-date: Tue, 6 Mar 2001 02:15:00 GMT modified-by: Ian Hicks robot-name: Robozilla robot-cover-url: http://dmoz.org/ robot-details-url: http://www.dmoz.org/newsletter/2000Aug/robo.html robot-owner-name: "Rob O'Zilla" robot-owner-url: http://dmoz.org/profiles/robozilla.html robot-owner-email: robozilla@dmozed.org robot-status: active robot-purpose: maintenance robot-type: standalone robot-availability: none robot-exclusion: no robot-noindex: no robot-host: directory.mozilla.org robot-useragent: Robozilla/1.0 robot-description: Robozilla visits all the links within the Open Directory periodically, marking the ones that return errors for review. robot-environment: service robot-name: Roverbot robot-cover-url: http://www.roverbot.com/ robot-details-url: robot-owner-name: GlobalMedia Design (Andrew Cowan & Brian Clark) robot-owner-url: http://www.radzone.org/gmd/ robot-owner-email: gmd@spyder.net robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: roverbot.com robot-from: yes robot-useragent: Roverbot robot-language: perl5 robot-description: Targeted email gatherer utilizing user-defined seed points and interacting with both the webserver and MX servers of remote sites. robot-history: robot-environment: modified-date: Tue Jun 18 19:16:31 1996. modified-by: robot-name: RuLeS robot-cover-url: http://www.rules.be robot-details-url: http://www.rules.be robot-owner-name: Marc Wils robot-owner-url: http://www.rules.be robot-owner-email: marc@rules.be robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: yes robot-noindex: yes robot-host: www.rules.be robot-from: yes robot-useragent: RuLeS/1.0 libwww/4.0 robot-language: Dutch (Nederlands) robot-description: robot-history: none robot-environment: hobby modified-date: Sun, 8 Apr 2001 13:06:54 CET modified-by: Marc Wils robot-name: SafetyNet Robot robot-cover-url: http://www.urlabs.com/ robot-details-url: robot-owner-name: Michael L. Nelson robot-owner-url: http://www.urlabs.com/ robot-owner-email: m.l.nelson@urlabs.com robot-status: robot-purpose: indexing. robot-type: standalone robot-platform: robot-availability: robot-exclusion: no. robot-exclusion-useragent: robot-noindex: robot-host: *.urlabs.com robot-from: yes robot-useragent: SafetyNet Robot 0.1, robot-language: Perl 5 robot-description: Finds URLs for K-12 content management. robot-history: robot-environment: modified-date: Sat Mar 23 20:12:39 1996. modified-by: robot-name: Scooter robot-cover-url: http://www.altavista.com/ robot-details-url: http://www.altavista.com/av/content/addurl.htm robot-owner-name: AltaVista robot-owner-url: http://www.altavista.com/ robot-owner-email: scooter@pa.dec.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Scooter robot-noindex: yes robot-host: *.av.pa-x.dec.com robot-from: yes robot-useragent: Scooter/2.0 G.R.A.B. V1.1.0 robot-language: c robot-description: Scooter is AltaVista's prime index agent. robot-history: Version 2 of Scooter/1.0 developed by Louis Monier of WRL. robot-environment: service modified-date: Wed, 13 Jan 1999 17:18:59 GMT modified-by: steves@avs.dec.com robot-name: Search.Aus-AU.COM robot-details-url: http://Search.Aus-AU.COM/ robot-cover-url: http://Search.Aus-AU.COM/ robot-owner-name: Dez Blanchfield robot-owner-url: not currently available robot-owner-email: dez@geko.com robot-status: - development: robot under development robot-purpose: - indexing: gather content for an indexing service robot-type: - standalone: a separate program robot-platform: - mac - unix - windows95 - windowsNT robot-availability: - none robot-exclusion: yes robot-exclusion-useragent: Search-AU robot-noindex: yes robot-host: Search.Aus-AU.COM, 203.55.124.29, 203.2.239.29 robot-from: no robot-useragent: not available robot-language: c, perl, sql robot-description: Search-AU is a development tool I have built to investigate the power of a search engine and web crawler to give me access to a database of web content ( html / url's ) and address's etc from which I hope to build more accurate stats about the .au zone's web content. the robot started crawling from http://www.geko.net.au/ on march 1st, 1998 and after nine days had 70mb of compressed ascii in a database to work with. i hope to run a refresh of the crawl every month initially, and soon every week bandwidth and cpu allowing. if the project warrants further development, i will turn it into an australian ( .au ) zone search engine and make it commercially available for advertising to cover the costs which are starting to mount up. --dez (980313 - black friday!) robot-environment: - hobby: written as a hobby modified-date: Fri Mar 13 10:03:32 EST 1998 robot-name: Sleek robot-cover-url: http://search-info.com/ robot-details-url: robot-owner-name: Lawrence R. Hughes, Sr. robot-owner-url: http://hughesnet.net/ robot-owner-email: lawrence.hughes@search-info.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Unix, Linux, Windows robot-availability: source;data robot-exclusion: yes robot-exclusion-useragent: robots.txt robot-noindex: yes robot-host: yes robot-from: yes robot-useragent: Mozilla/4.0 (Sleek Spider/1.2) robot-language: perl5 robot-description: Crawls remote sites and performs link popularity checks before inclusion. robot-history: HyBrid of the FDSE Crawler by: Zoltan Milosevic Current Mods: started 1/10/2002 robot-environment: hobby modified-date: Mon, 14 Jan 2002 08:02:23 GMT modified-by: Lawrence R. Hughes, Sr. robot-name: SearchProcess robot-cover-url: http://www.searchprocess.com robot-details-url: http://www.intelligence-process.com robot-owner-name: Mannina Bruno robot-owner-url: http://www.intelligence-process.com robot-owner-email: bruno@intelligence-process.com robot-status: active robot-purpose: Statistic robot-type: browser robot-platform: linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: searchprocess robot-noindex: yes robot-host: searchprocess.com robot-from: yes robot-useragent: searchprocess/0.9 robot-language: perl robot-description: An intelligent Agent Online. SearchProcess is used to provide structured information to user. robot-history: This is the son of Auresys robot-environment: Service freeware modified-date: Thus, 22 Dec 1999 modified-by: Mannina Bruno robot-name: Senrigan robot-cover-url: http://www.info.waseda.ac.jp/search-e.html robot-details-url: robot-owner-name: TAMURA Kent robot-owner-url: http://www.info.waseda.ac.jp/muraoka/members/kent/ robot-owner-email: kent@muraoka.info.waseda.ac.jp robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Java robot-availability: none robot-exclusion: yes robot-exclusion-useragent:Senrigan robot-noindex: yes robot-host: aniki.olu.info.waseda.ac.jp robot-from: yes robot-useragent: Senrigan/xxxxxx robot-language: Java robot-description: This robot now gets HTMLs from only jp domain. robot-history: It has been running since Dec 1994 robot-environment: research modified-date: Mon Jul 1 07:30:00 GMT 1996 modified-by: TAMURA Kent robot-name: SG-Scout robot-cover-url: http://www-swiss.ai.mit.edu/~ptbb/SG-Scout/SG-Scout.html robot-details-url: robot-owner-name: Peter Beebee robot-owner-url: http://www-swiss.ai.mit.edu/~ptbb/personal/index.html robot-owner-email: ptbb@ai.mit.edu, beebee@parc.xerox.com robot-status: active robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: beta.xerox.com robot-from: yes robot-useragent: SG-Scout robot-language: robot-description: Does a "server-oriented" breadth-first search in a round-robin fashion, with multiple processes. robot-history: Run since 27 June 1994, for an internal XEROX research project robot-environment: modified-date: modified-by: robot-name:ShagSeeker robot-cover-url:http://www.shagseek.com robot-details-url: robot-owner-name:Joseph Reynolds robot-owner-url:http://www.shagseek.com robot-owner-email:joe.reynolds@shagseek.com robot-status:active robot-purpose:indexing robot-type:standalone robot-platform:unix robot-availability:data robot-exclusion:yes robot-exclusion-useragent:Shagseeker robot-noindex:yes robot-host:shagseek.com robot-from: robot-useragent:Shagseeker at http://www.shagseek.com /1.0 robot-language:perl5 robot-description:Shagseeker is the gatherer for the Shagseek.com search engine and goes out weekly. robot-history:none yet robot-environment:service modified-date:Mon 17 Jan 2000 10:00:00 EST modified-by:Joseph Reynolds robot-name: Shai'Hulud robot-cover-url: robot-details-url: robot-owner-name: Dimitri Khaoustov robot-owner-url: robot-owner-email: shawdow@usa.net robot-status: active robot-purpose: mirroring robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: *.rdtex.ru robot-from: robot-useragent: Shai'Hulud robot-language: C robot-description: Used to build mirrors for internal use robot-history: This robot finds its roots in a research project at RDTeX Perspective Projects Group in 1996 robot-environment: research modified-date: Mon, 5 Aug 1996 14:35:08 GMT modified-by: Dimitri Khaoustov robot-name: Sift robot-cover-url: http://www.worthy.com/ robot-details-url: http://www.worthy.com/ robot-owner-name: Bob Worthy robot-owner-url: http://www.worthy.com/~bworthy robot-owner-email: bworthy@worthy.com robot-status: development, active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: yes robot-exclusion-useragent: sift robot-noindex: yes robot-host: www.worthy.com robot-from: robot-useragent: libwww-perl-5.41 robot-language: perl robot-description: Subject directed (via key phrase list) indexing. robot-history: Libwww of course, implementation using MySQL August, 1999. Indexing Search and Rescue sites. robot-environment: research, service modified-date: Sat, 16 Oct 1999 19:40:00 GMT modified-by: Bob Worthy robot-name: Simmany Robot Ver1.0 robot-cover-url: http://simmany.hnc.net/ robot-details-url: http://simmany.hnc.net/irman1.html robot-owner-name: Youngsik, Lee(@L?5=D) robot-owner-url: robot-owner-email: ailove@hnc.co.kr robot-status: development & active robot-purpose: indexing, maintenance, statistics robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: SimBot robot-noindex: no robot-host: sansam.hnc.net robot-from: no robot-useragent: SimBot/1.0 robot-language: C robot-description: The Simmany Robot is used to build the Map(DB) for the simmany service operated by HNC(Hangul & Computer Co., Ltd.). The robot runs weekly, and visits sites that have a useful korean information in a defined order. robot-history: This robot is a part of simmany service and simmini products. The simmini is the Web products that make use of the indexing and retrieving modules of simmany. robot-environment: service, commercial modified-date: Thu, 19 Sep 1996 07:02:26 GMT modified-by: Youngsik, Lee robot-name: Site Valet robot-cover-url: http://valet.webthing.com/ robot-details-url: http://valet.webthing.com/ robot-owner-name: Nick Kew robot-owner-url: robot-owner-email: nick@webthing.com robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: yes robot-exclusion-useragent: Site Valet robot-noindex: no robot-host: valet.webthing.com,valet.* robot-from: yes robot-useragent: Site Valet robot-language: perl robot-description: a deluxe site monitoring and analysis service robot-history: builds on cg-eye, the WDG Validator, and the Link Valet robot-environment: service modified-date: Tue, 27 June 2000 modified-by: nick@webthing.com robot-name: SiteTech-Rover robot-cover-url: http://www.sitetech.com/ robot-details-url: robot-owner-name: Anil Peres-da-Silva robot-owner-url: http://www.sitetech.com robot-owner-email: adasilva@sitetech.com robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: SiteTech-Rover robot-language: C++. robot-description: Originated as part of a suite of Internet Products to organize, search & navigate Intranet sites and to validate links in HTML documents. robot-history: This robot originally went by the name of LiberTech-Rover robot-environment: modified-date: Fri Aug 9 17:06:56 1996. modified-by: Anil Peres-da-Silva robot-name: Skymob.com robot-cover-url: http://www.skymob.com/ robot-details-url: http://www.skymob.com/about.html robot-owner-name: Have IT Now Limited. robot-owner-url: http://www.skymob.com/ robot-owner-email: searchmaster@skymob.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: skymob robot-noindex: no robot-host: www.skymob.com robot-from: searchmaster@skymob.com robot-useragent: aWapClient robot-language: c++ robot-description: WAP content Crawler. robot-history: new robot-environment: service modified-date: Thu Sep 6 17:50:32 BST 2001 modified-by: Owen Lydiard robot-name:SLCrawler robot-cover-url: robot-details-url: robot-owner-name:Inxight Software robot-owner-url:http://www.inxight.com robot-owner-email:kng@inxight.com robot-status:active robot-purpose:To build the site map. robot-type:standalone robot-platform:windows, windows95, windowsNT robot-availability:none robot-exclusion:yes robot-exclusion-useragent:SLCrawler/2.0 robot-noindex:no robot-host:n/a robot-from: robot-useragent:SLCrawler robot-language:Java robot-description:To build the site map. robot-history:It is SLCrawler to crawl html page on Internet. robot-environment: commercial: is a commercial product modified-date:Nov. 15, 2000 modified-by:Karen Ng robot-name: Inktomi Slurp robot-cover-url: http://www.inktomi.com/ robot-details-url: http://www.inktomi.com/slurp.html robot-owner-name: Inktomi Corporation robot-owner-url: http://www.inktomi.com/ robot-owner-email: slurp@inktomi.com robot-status: active robot-purpose: indexing, statistics robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: slurp robot-noindex: yes robot-host: *.inktomi.com robot-from: yes robot-useragent: Slurp/2.0 robot-language: C/C++ robot-description: Indexing documents for the HotBot search engine (www.hotbot.com), collecting Web statistics robot-history: Switch from Slurp/1.0 to Slurp/2.0 November 1996 robot-environment: service modified-date: Fri Feb 28 13:57:43 PST 1997 modified-by: slurp@inktomi.com robot-name: Smart Spider robot-cover-url: http://www.travel-finder.com robot-details-url: http://www.engsoftware.com/robots.htm robot-owner-name: Ken Wadland robot-owner-url: http://www.engsoftware.com robot-owner-email: ken@engsoftware.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: windows95, windowsNT robot-availability: data, binary, source robot-exclusion: Yes robot-exclusion-useragent: ESI robot-noindex: Yes robot-host: 207.16.241.* robot-from: Yes robot-useragent: ESISmartSpider/2.0 robot-language: C++ robot-description: Classifies sites using a Knowledge Base. Robot collects web pages which are then parsed and feed to the Knowledge Base. The Knowledge Base classifies the sites into any of hundreds of categories based on the vocabulary used. Currently used by: //www.travel-finder.com (Travel and Tourist Info) and //www.golightway.com (Christian Sites). Several options exist to control whether sites are discovered and/or classified fully automatically, full manually or somewhere in between. robot-history: Feb '96 -- Product design begun. May '96 -- First data results published by Travel-Finder. Oct '96 -- Generalized and announced and a product for other sites. Jan '97 -- First data results published by GoLightWay. robot-environment: service, commercial modified-date: Mon, 13 Jan 1997 10:41:00 EST modified-by: Ken Wadland robot-name: Snooper robot-cover-url: http://darsun.sit.qc.ca robot-details-url: robot-owner-name: Isabelle A. Melnick robot-owner-url: robot-owner-email: melnicki@sit.ca robot-status: part under development and part active robot-purpose: robot-type: robot-platform: robot-availability: none robot-exclusion: yes robot-exclusion-useragent: snooper robot-noindex: robot-host: robot-from: robot-useragent: Snooper/b97_01 robot-language: robot-description: robot-history: robot-environment: modified-date: modified-by: robot-name: Solbot robot-cover-url: http://kvasir.sol.no/ robot-details-url: robot-owner-name: Frank Tore Johansen robot-owner-url: robot-owner-email: ftj@sys.sol.no robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: solbot robot-noindex: yes robot-host: robot*.sol.no robot-from: robot-useragent: Solbot/1.0 LWP/5.07 robot-language: perl, c robot-description: Builds data for the Kvasir search service. Only searches sites which ends with one of the following domains: "no", "se", "dk", "is", "fi" robot-history: This robot is the result of a 3 years old late night hack when the Verity robot (of that time) was unable to index sites with iso8859 characters (in URL and other places), and we just _had_ to have something up and going the next day... robot-environment: service modified-date: Tue Apr 7 16:25:05 MET DST 1998 modified-by: Frank Tore Johansen <ftj@sys.sol.no> robot-name:Speedy Spider robot-cover-url:http://www.entireweb.com/ robot-details-url:http://www.entireweb.com/speedy.html robot-owner-name:WorldLight.com AB robot-owner-url:http://www.worldlight.com robot-owner-email:speedy@worldlight.com robot-status:active robot-purpose:indexing robot-type:standalone robot-platform:Windows robot-availability:none robot-exclusion:yes robot-exclusion-useragent:speedy robot-noindex:yes robot-host:router-00.sverige.net, 193.15.210.29, *.entireweb.com, *.worldlight.com robot-from:yes robot-useragent:Speedy Spider ( http://www.entireweb.com/speedy.html ) robot-language:C, C++ robot-description:Speedy Spider is used to build the database for the Entireweb.com search service operated by WorldLight.com (part of WorldLight Network). The robot runs constantly, and visits sites in a random order. robot-history:This robot is a part of the highly advanced search engine Entireweb.com, that was developed in Halmstad, Sweden during 1998-2000. robot-environment:service, commercial modified-date:Mon, 17 July 2000 11:05:03 GMT modified-by:Marcus Andersson robot-name: spider_monkey robot-cover-url: http://www.mobrien.com/add_site.html robot-details-url: http://www.mobrien.com/add_site.html robot-owner-name: MPRM Group Limited robot-owner-url: http://www.mobrien.com robot-owner-email: mprm@ionsys.com robot-status: robot actively in use robot-purpose: gather content for a free indexing service robot-type: FDSE robot robot-platform: unix robot-availability: bulk data gathered by robot available robot-exclusion: yes robot-exclusion-useragent: spider_monkey robot-noindex: yes robot-host: snowball.ionsys.com robot-from: yes robot-useragent: mouse.house/7.1 robot-language: perl5 robot-description: Robot runs every 30 days for a full index and weekly = on a list of accumulated visitor requests robot-history: This robot is under development and currently active robot-environment: written as an employee / guest service modified-date: Mon, 22 May 2000 12:28:52 GMT modified-by: MPRM Group Limited robot-name: SpiderBot robot-cover-url: http://pisuerga.inf.ubu.es/lsi/Docencia/TFC/ITIG/icruzadn/cover.htm robot-details-url: http://pisuerga.inf.ubu.es/lsi/Docencia/TFC/ITIG/icruzadn/details.htm robot-owner-name: Ignacio Cruzado Nu.o robot-owner-url: http://pisuerga.inf.ubu.es/lsi/Docencia/TFC/ITIG/icruzadn/icruzadn.htm robot-owner-email: spidrboticruzado@solaria.emp.ubu.es robot-status: active robot-purpose: indexing, mirroring robot-type: standalone, browser robot-platform: unix, windows, windows95, windowsNT robot-availability: source, binary, data robot-exclusion: yes robot-exclusion-useragent: SpiderBot/1.0 robot-noindex: yes robot-host: * robot-from: yes robot-useragent: SpiderBot/1.0 robot-language: C++, Tcl robot-description: Recovers Web Pages and saves them on your hard disk. Then it reindexes them. robot-history: This Robot belongs to Ignacio Cruzado Nu.o End of Studies Thesis "Recuperador p.ginas Web", to get the titulation of "Management Tecnical Informatics Engineer" in the for the Burgos University in Spain. robot-environment: research modified-date: Sun, 27 Jun 1999 09:00:00 GMT modified-by: Ignacio Cruzado Nu.o robot-name: Spiderline Crawler robot-cover-url: http://www.spiderline.com/ robot-details-url: http://www.spiderline.com/ robot-owner-name: Benjamin Benson robot-owner-url: http://www.spiderline.com/ robot-owner-email: ben@spiderline.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: free and commercial services robot-exclusion: yes robot-exclusion-useragent: spiderline robot-noindex: yes robot-host: *.spiderline.com, *.spiderline.org robot-from: no robot-useragent: spiderline/3.1.3 robot-language: c, c++ robot-description: robot-history: Developed for Spiderline.com, launched in 2001. robot-environment: service modified-date: Wed, 21 Feb 2001 03:36:39 GMT modified-by: Benjamin Benson robot-name:SpiderMan robot-cover-url:http://www.comp.nus.edu.sg/~leunghok robot-details-url:http://www.comp.nus.edu.sg/~leunghok/honproj.html robot-owner-name:Leung Hok Peng , The School Of Computing Nus , Singapore robot-owner-url:http://www.comp.nus.edu.sg/~leunghok robot-owner-email:leunghok@comp.nus.edu.sg robot-status:development & active robot-purpose:user searching using IR technique robot-type:stand alone robot-platform:Java 1.2 robot-availability:binary&source robot-exclusion:no robot-exclusion-useragent:nil robot-noindex:no robot-host:NA robot-from:NA robot-useragent:SpiderMan 1.0 robot-language:java robot-description:It is used for any user to search the web given a query string robot-history:Originated from The Center for Natural Product Research and The School of computing National University Of Singapore robot-environment:research modified-date:08/08/1999 modified-by:Leung Hok Peng and Dr Hsu Wynne robot-name: SpiderView(tm) robot-cover-url: http://www.northernwebs.com/set/spider_view.html robot-details-url: http://www.northernwebs.com/set/spider_sales.html robot-owner-name: Northern Webs robot-owner-url: http://www.northernwebs.com robot-owner-email: webmaster@northernwebs.com robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: unix, nt robot-availability: source robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: bobmin.quad2.iuinc.com, * robot-from: No robot-useragent: Mozilla/4.0 (compatible; SpiderView 1.0;unix) robot-language: perl robot-description: SpiderView is a server based program which can spider a webpage, testing the links found on the page, evaluating your server and its performance. robot-history: This is an offshoot http retrieval program based on our Medibot software. robot-environment: commercial modified-date: modified-by: robot-name: Spry Wizard Robot robot-cover-url: http://www.spry.com/wizard/index.html robot-details-url: robot-owner-name: spry robot-owner-url: ttp://www.spry.com/index.html robot-owner-email: info@spry.com robot-status: robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: wizard.spry.com or tiger.spry.com robot-from: no robot-useragent: no robot-language: robot-description: Its purpose is to generate a Resource Discovery database Spry is refusing to give any comments about this robot robot-history: robot-environment: modified-date: Tue Jul 11 09:29:45 GMT 1995 modified-by: robot-name: Site Searcher robot-cover-url: www.satacoy.com robot-details-url: www.satacoy.com robot-owner-name: Zackware robot-owner-url: www.satacoy.com robot-owner-email: zackware@hotmail.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: winows95, windows98, windowsNT robot-availability: binary robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: no robot-useragent: ssearcher100 robot-language: C++ robot-description: Site Searcher scans web sites for specific file types. (JPG, MP3, MPG, etc) robot-history: Released 4/4/1999 robot-environment: hobby modified-date: 04/26/1999 robot-name: Suke robot-cover-url: http://www.kensaku.org/ robot-details-url: http://www.kensaku.org/ robot-owner-name: Yosuke Kuroda robot-owner-url: http://www.kensaku.org/yk/ robot-owner-email: robot@kensaku.org robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: FreeBSD3.* robot-availability: source robot-exclusion: yes robot-exclusion-useragent: suke robot-noindex: no robot-host: * robot-from: yes robot-useragent: suke/*.* robot-language: c robot-description: This robot visits mainly sites in japan. robot-history: since 1999 robot-environment: service robot-name: suntek search engine robot-cover-url: http://www.portal.com.hk/ robot-details-url: http://www.suntek.com.hk/ robot-owner-name: Suntek Computer Systems robot-owner-url: http://www.suntek.com.hk/ robot-owner-email: karen@suntek.com.hk robot-status: operational robot-purpose: to create a search portal on Asian web sites robot-type: robot-platform: NT, Linux, UNIX robot-availability: available now robot-exclusion: robot-exclusion-useragent: robot-noindex: yes robot-host: search.suntek.com.hk robot-from: yes robot-useragent: suntek/1.0 robot-language: Java robot-description: A multilingual search engine with emphasis on Asia contents robot-history: robot-environment: modified-date: modified-by: robot-name: Sven robot-cover-url: robot-details-url: http://marty.weathercity.com/sven/ robot-owner-name: Marty Anstey robot-owner-url: http://marty.weathercity.com/ robot-owner-email: rhondle@home.com robot-status: Active robot-purpose: indexing robot-type: standalone robot-platform: Windows robot-availability: none robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: 24.113.12.29 robot-from: no robot-useragent: robot-language: VB5 robot-description: Used to gather sites for netbreach.com. Runs constantly. robot-history: Developed as an experiment in web indexing. robot-environment: hobby, service modified-date: Tue, 3 Mar 1999 08:15:00 PST modified-by: Marty Anstey robot-name: TACH Black Widow robot-cover-url: http://theautochannel.com/~mjenn/bw.html robot-details-url: http://theautochannel.com/~mjenn/bw-syntax.html robot-owner-name: Michael Jennings robot-owner-url: http://www.spd.louisville.edu/~mejenn01/ robot-owner-email: mjenn@theautochannel.com robot-status: development robot-purpose: maintenance: link validation robot-type: standalone robot-platform: UNIX, Linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: tach_bw robot-noindex: no robot-host: *.theautochannel.com robot-from: yes robot-useragent: Mozilla/3.0 (Black Widow v1.1.0; Linux 2.0.27; Dec 31 1997 12:25:00 robot-language: C/C++ robot-description: Exhaustively recurses a single site to check for broken links robot-history: Corporate application begun in 1996 for The Auto Channel robot-environment: commercial modified-date: Thu, Jan 23 1997 23:09:00 GMT modified-by: Michael Jennings robot-name: Tarantula robot-cover-url: http://www.nathan.de/nathan/software.html#TARANTULA robot-details-url: http://www.nathan.de/ robot-owner-name: Markus Hoevener robot-owner-url: robot-owner-email: Markus.Hoevener@evision.de robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: yes robot-noindex: yes robot-host: yes robot-from: no robot-useragent: Tarantula/1.0 robot-language: C robot-description: Tarantual gathers information for german search engine Nathanrobot-history: Started February 1997 robot-environment: service modified-date: Mon, 29 Dec 1997 15:30:00 GMT modified-by: Markus Hoevener robot-name: tarspider robot-cover-url: robot-details-url: robot-owner-name: Olaf Schreck robot-owner-url: http://www.chemie.fu-berlin.de/user/chakl/ChaklHome.html robot-owner-email: chakl@fu-berlin.de robot-status: robot-purpose: mirroring robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: chakl@fu-berlin.de robot-useragent: tarspider robot-language: robot-description: robot-history: robot-environment: modified-date: modified-by: robot-name: Tcl W3 Robot robot-cover-url: http://hplyot.obspm.fr/~dl/robo.html robot-details-url: robot-owner-name: Laurent Demailly robot-owner-url: http://hplyot.obspm.fr/~dl/ robot-owner-email: dl@hplyot.obspm.fr robot-status: robot-purpose: maintenance, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: hplyot.obspm.fr robot-from: yes robot-useragent: dlw3robot/x.y (in TclX by http://hplyot.obspm.fr/~dl/) robot-language: tcl robot-description: Its purpose is to validate links, and generate statistics. robot-history: robot-environment: modified-date: Tue May 23 17:51:39 1995 modified-by: robot-name: TechBOT robot-cover-url: http://www.techaid.net/ robot-details-url: http://www.echaid.net/TechBOT/ robot-owner-name: TechAID Internet Services robot-owner-url: http://www.techaid.net/ robot-owner-email: techbot@techaid.net robot-status: active robot-purpose:statistics, maintenance robot-type: standalone robot-platform: Unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: TechBOT robot-noindex: yes robot-host: techaid.net robot-from: yes robot-useragent: TechBOT robot-language: perl5 robot-description: TechBOT is constantly upgraded. Currently he is used for Link Validation, Load Time, HTML Validation and much much more. robot-history: TechBOT started his life as a Page Change Detection robot, but has taken on many new and exciting roles. robot-environment: service modified-date: Sat, 18 Dec 1998 14:26:00 EST modified-by: techbot@techaid.net robot-name: Templeton robot-cover-url: http://www.bmtmicro.com/catalog/tton/ robot-details-url: http://www.bmtmicro.com/catalog/tton/ robot-owner-name: Neal Krawetz robot-owner-url: http://www.cs.tamu.edu/people/nealk/ robot-owner-email: nealk@net66.com robot-status: active robot-purpose: mirroring, mapping, automating web applications robot-type: standalone robot-platform: OS/2, Linux, SunOS, Solaris robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: templeton robot-noindex: no robot-host: * robot-from: yes robot-useragent: Templeton/{version} for {platform} robot-language: C robot-description: Templeton is a very configurable robots for mirroring, mapping, and automating applications on retrieved documents. robot-history: This robot was originally created as a test-of-concept. robot-environment: service, commercial, research, hobby modified-date: Sun, 6 Apr 1997 10:00:00 GMT modified-by: Neal Krawetz robot-name: TitIn robot-cover-url: http://www.foi.hr/~dpavlin/titin/ robot-details-url: http://www.foi.hr/~dpavlin/titin/tehnical.htm robot-owner-name: Dobrica Pavlinusic robot-owner-url: http://www.foi.hr/~dpavlin/ robot-owner-email: dpavlin@foi.hr robot-status: development robot-purpose: indexing, statistics robot-type: standalone robot-platform: unix robot-availability: data, source on request robot-exclusion: yes robot-exclusion-useragent: titin robot-noindex: no robot-host: barok.foi.hr robot-from: no robot-useragent: TitIn/0.2 robot-language: perl5, c robot-description: The TitIn is used to index all titles of Web server in .hr domain. robot-history: It was done as result of desperate need for central index of Croatian web servers in December 1996. robot-environment: research modified-date: Thu, 12 Dec 1996 16:06:42 MET modified-by: Dobrica Pavlinusic robot-name: TITAN robot-cover-url: http://isserv.tas.ntt.jp/chisho/titan-e.html robot-details-url: http://isserv.tas.ntt.jp/chisho/titan-help/eng/titan-help-e.html robot-owner-name: Yoshihiko HAYASHI robot-owner-url: robot-owner-email: hayashi@nttnly.isl.ntt.jp robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: SunOS 4.1.4 robot-availability: no robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: nlptitan.isl.ntt.jp robot-from: yes robot-useragent: TITAN/0.1 robot-language: perl 4 robot-description: Its purpose is to generate a Resource Discovery database, and copy document trees. Our primary goal is to develop an advanced method for indexing the WWW documents. Uses libwww-perl robot-history: robot-environment: modified-date: Mon Jun 24 17:20:44 PDT 1996 modified-by: Yoshihiko HAYASHI robot-name: The TkWWW Robot robot-cover-url: http://fang.cs.sunyit.edu/Robots/tkwww.html robot-details-url: robot-owner-name: Scott Spetka robot-owner-url: http://fang.cs.sunyit.edu/scott/scott.html robot-owner-email: scott@cs.sunyit.edu robot-status: robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: robot-useragent: robot-language: robot-description: It is designed to search Web neighborhoods to find pages that may be logically related. The Robot returns a list of links that looks like a hot list. The search can be by key word or all links at a distance of one or two hops may be returned. The TkWWW Robot is described in a paper presented at the WWW94 Conference in Chicago. robot-history: robot-environment: modified-date: modified-by: robot-name:TLSpider robot-cover-url: n/a robot-details-url: n/a robot-owner-name: topiclink.com robot-owner-url: topiclink.com robot-owner-email: tlspider@outtel.com robot-status: not activated robot-purpose: to get web sites and add them to the topiclink future directory robot-type:development: robot under development robot-platform:linux robot-availability:none robot-exclusion:yes robot-exclusion-useragent:topiclink robot-noindex:no robot-host: tlspider.topiclink.com (not avalible yet) robot-from:no robot-useragent:TLSpider/1.1 robot-language:perl5 robot-description:This robot runs 2 days a week getting information for TopicLink.com robot-history:This robot was created to server for the internet search engine TopicLink.com robot-environment:service modified-date:September,10,1999 17:28 GMT modified-by: TopicLink Spider Team robot-name: UCSD Crawl robot-cover-url: http://www.mib.org/~ucsdcrawl robot-details-url: robot-owner-name: Adam Tilghman robot-owner-url: http://www.mib.org/~atilghma robot-owner-email: atilghma@mib.org robot-status: robot-purpose: indexing, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: nuthaus.mib.org scilib.ucsd.edu robot-from: yes robot-useragent: UCSD-Crawler robot-language: Perl 4 robot-description: Should hit ONLY within UC San Diego - trying to count servers here. robot-history: robot-environment: modified-date: Sat Jan 27 09:21:40 1996. modified-by: robot-name: UdmSearch robot-details-url: http://mysearch.udm.net/ robot-cover-url: http://mysearch.udm.net/ robot-owner-name: Alexander Barkov robot-owner-url: http://mysearch.udm.net/ robot-owner-email: bar@izhcom.ru robot-status: active robot-purpose: indexing, validation robot-type: standalone robot-platform: unix robot-availability: source, binary robot-exclusion: yes robot-exclusion-useragent: UdmSearch robot-noindex: yes robot-host: * robot-from: no robot-useragent: UdmSearch/2.1.1 robot-language: c robot-description: UdmSearch is a free web search engine software for intranet/small domain internet servers robot-history: Developed since 1998, origin purpose is a search engine over republic of Udmurtia http://search.udm.net robot-environment: hobby modified-date: Mon, 6 Sep 1999 10:28:52 GMT robot-name: URL Check robot-cover-url: http://www.cutternet.com/products/webcheck.html robot-details-url: http://www.cutternet.com/products/urlck.html robot-owner-name: Dave Finnegan robot-owner-url: http://www.cutternet.com robot-owner-email: dave@cutternet.com robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: unix robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: urlck robot-noindex: no robot-host: * robot-from: yes robot-useragent: urlck/1.2.3 robot-language: c robot-description: The robot is used to manage, maintain, and modify web sites. It builds a database detailing the site, builds HTML reports describing the site, and can be used to up-load pages to the site or to modify existing pages and URLs within the site. It can also be used to mirror whole or partial sites. It supports HTTP, File, FTP, and Mailto schemes. robot-history: Originally designed to validate URLs. robot-environment: commercial modified-date: July 9, 1997 modified-by: Dave Finnegan robot-name: URL Spider Pro robot-cover-url: http://www.innerprise.net robot-details-url: http://www.innerprise.net/us.htm robot-owner-name: Innerprise robot-owner-url: http://www.innerprise.net robot-owner-email: greg@innerprise.net robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Windows9x/NT robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: * robot-noindex: yes robot-host: * robot-from: no robot-useragent: URL Spider Pro robot-language: delphi robot-description: Used for building a database of web pages. robot-history: Project started July 1998. robot-environment: commercial modified-date: Mon, 12 Jul 1999 17:50:30 GMT modified-by: Innerprise robot-name: Valkyrie robot-cover-url: http://kichijiro.c.u-tokyo.ac.jp/odin/ robot-details-url: http://kichijiro.c.u-tokyo.ac.jp/odin/robot.html robot-owner-name: Masanori Harada robot-owner-url: http://www.graco.c.u-tokyo.ac.jp/~harada/ robot-owner-email: harada@graco.c.u-tokyo.ac.jp robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Valkyrie libwww-perl robot-noindex: no robot-host: *.c.u-tokyo.ac.jp robot-from: yes robot-useragent: Valkyrie/1.0 libwww-perl/0.40 robot-language: perl4 robot-description: used to collect resources from Japanese Web sites for ODIN search engine. robot-history: This robot has been used since Oct. 1995 for author's research. robot-environment: service research modified-date: Thu Mar 20 19:09:56 JST 1997 modified-by: harada@graco.c.u-tokyo.ac.jp robot-name: Verticrawl robot-cover-url: http://www.verticrawl.com/ robot-details-url: http://www.verticrawl.com/ robot-owner-name: Velic, Epromat, Malinge, Troutot, Lhuisset robot-owner-url: http://www.verticrawl.com/ robot-owner-email: webmaster@velic.com, webmaster@epromat.com robot-status: active robot-purpose: indexing, maintenance, statistics, and classifying urls in a global ASP solution robot-type: standalone robot-platform: Unix, Linux and windowsNT robot-availability: none robot-exclusion: verticrawl robot-exclusion-useragent: verticrawl robot-noindex: yes robot-host: http://193.251.26.45:15555/ robot-from: Yes robot-useragent: Verticrawl robot-language: c, perl robot-description: Verticrawl is a global search engine dedicated to application service providing in specialized directory projet robot-history: Verticrawl is based on web solutions for knowledge management and Web portals back office services robot-environment: commercial modified-date: mon, 10 Dec 2001 17:28:52 GMT modified-by: webmaster@velic.com robot-name: Victoria robot-cover-url: robot-details-url: robot-owner-name: Adrian Howard robot-owner-url: robot-owner-email: adrianh@oneworld.co.uk robot-status: development robot-purpose: maintenance robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Victoria robot-noindex: yes robot-host: robot-from: robot-useragent: Victoria/1.0 robot-language: perl,c robot-description: Victoria is part of a groupware produced by Victoria Real Ltd. (voice: +44 [0]1273 774469, fax: +44 [0]1273 779960 email: victoria@pavilion.co.uk). Victoria is used to monitor changes in W3 documents, both intranet and internet based. Contact Victoria Real for more information. robot-history: robot-environment: commercial modified-date: Fri, 22 Nov 1996 16:45 GMT modified-by: victoria@pavilion.co.uk robot-name: vision-search robot-cover-url: http://www.ius.cs.cmu.edu/cgi-bin/vision-search robot-details-url: robot-owner-name: Henry A. Rowley robot-owner-url: http://www.cs.cmu.edu/~har robot-owner-email: har@cs.cmu.edu robot-status: robot-purpose: indexing. robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: dylan.ius.cs.cmu.edu robot-from: no robot-useragent: vision-search/3.0' robot-language: Perl 5 robot-description: Intended to be an index of computer vision pages, containing all pages within <em>n</em> links (for some small <em>n</em>) of the Vision Home Page robot-history: robot-environment: modified-date: Fri Mar 8 16:03:04 1996 modified-by: robot-name: Voyager robot-cover-url: http://www.lisa.co.jp/voyager/ robot-details-url: robot-owner-name: Voyager Staff robot-owner-url: http://www.lisa.co.jp/voyager/ robot-owner-email: voyager@lisa.co.jp robot-status: development robot-purpose: indexing, maintenance robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Voyager robot-noindex: no robot-host: *.lisa.co.jp robot-from: yes robot-useragent: Voyager/0.0 robot-language: perl5 robot-description: This robot is used to build the database for the Lisa Search service. The robot manually launch and visits sites in a random order. robot-history: robot-environment: service modified-date: Mon, 30 Nov 1998 08:00:00 GMT modified-by: Hideyuki Ezaki robot-name: VWbot robot-cover-url: http://vancouver-webpages.com/VWbot/ robot-details-url: http://vancouver-webpages.com/VWbot/aboutK.shtml robot-owner-name: Andrew Daviel robot-owner-url: http://vancouver-webpages.com/~admin/ robot-owner-email: andrew@vancouver-webpages.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: VWbot_K robot-noindex: yes robot-host: vancouver-webpages.com robot-from: yes robot-useragent: VWbot_K/4.2 robot-language: perl4 robot-description: Used to index BC sites for the searchBC database. Runs daily. robot-history: Originally written fall 1995. Actively maintained. robot-environment: service commercial research modified-date: Tue, 4 Mar 1997 20:00:00 GMT modified-by: Andrew Daviel robot-name: The NWI Robot robot-cover-url: http://www.ub2.lu.se/NNC/projects/NWI/the_nwi_robot.html robot-owner-name: Sigfrid Lundberg, Lund university, Sweden robot-owner-url: http://nwi.ub2.lu.se/~siglun robot-owner-email: siglun@munin.ub2.lu.se robot-status: active robot-purpose: discovery,statistics robot-type: standalone robot-platform: UNIX robot-availability: none (at the moment) robot-exclusion: yes robot-noindex: No robot-host: nwi.ub2.lu.se, mars.dtv.dk and a few others robot-from: yes robot-useragent: w3index robot-language: perl5 robot-description: A resource discovery robot, used primarily for the indexing of the Scandinavian Web robot-history: It is about a year or so old. Written by Anders Ard–, Mattias Borrell, HÂkan Ard– and myself. robot-environment: service,research modified-date: Wed Jun 26 13:58:04 MET DST 1996 modified-by: Sigfrid Lundberg robot-name: W3M2 robot-cover-url: http://tronche.com/W3M2 robot-details-url: robot-owner-name: Christophe Tronche robot-owner-url: http://tronche.com/ robot-owner-email: tronche@lri.fr robot-status: robot-purpose: indexing, maintenance, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: yes robot-useragent: W3M2/x.xxx robot-language: Perl 4, Perl 5, and C++ robot-description: to generate a Resource Discovery database, validate links, validate HTML, and generate statistics robot-history: robot-environment: modified-date: Fri May 5 17:48:48 1995 modified-by: robot-name: WallPaper (alias crawlpaper) robot-cover-url: http://www.crawlpaper.com/ robot-details-url: http://sourceforge.net/projects/crawlpaper/ robot-owner-name: Luca Piergentili robot-owner-url: http://www.geocities.com/lpiergentili/ robot-owner-email: lpiergentili@yahoo.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: windows robot-availability: source, binary robot-exclusion: yes robot-exclusion-useragent: crawlpaper robot-noindex: no robot-host: robot-from: robot-useragent: CrawlPaper/n.n.n (Windows n) robot-language: C++ robot-description: a crawler for pictures download and offline browsing robot-history: started as screensaver the program has evolved to a crawler including an audio player, etc. robot-environment: hobby modified-date: Mon, 25 Aug 2003 09:00:00 GMT modified-by: robot-name: the World Wide Web Wanderer robot-cover-url: http://www.mit.edu/people/mkgray/net/ robot-details-url: robot-owner-name: Matthew Gray robot-owner-url: http://www.mit.edu:8001/people/mkgray/mkgray.html robot-owner-email: mkgray@mit.edu robot-status: active robot-purpose: statistics robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: *.mit.edu robot-from: robot-useragent: WWWWanderer v3.0 robot-language: perl4 robot-description: Run initially in June 1993, its aim is to measure the growth in the web. robot-history: robot-environment: research modified-date: modified-by: robot-name: w@pSpider by wap4.com robot-cover-url: http://mopilot.com/ robot-details-url: http://wap4.com/portfolio.htm robot-owner-name: Dieter Kneffel robot-owner-url: http://wap4.com/ (corporate) robot-owner-email: info@wap4.com robot-status: active robot-purpose: indexing, maintenance (special: dedicated to wap/wml pages) robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: yes robot-exclusion-useragent: wapspider robot-noindex: [does not apply for wap/wml pages!] robot-host: *.wap4.com, *.mopilot.com robot-from: yes robot-useragent: w@pSpider/xxx (unix) by wap4.com robot-language: c, php, sql robot-description: wapspider is used to build the database for mopilot.com, a search engine for mobile contents; it is specially designed to crawl wml-pages. html is indexed, but html-links are (currently) not followed robot-history: this robot was developed by wap4.com in 1999 for the world's first wap-search engine robot-environment: service, commercial, research modified-date: Fri, 23 Jun 2000 14:33:52 MESZ modified-by: Dieter Kneffel, data@wap4.com robot-name:WebBandit Web Spider robot-cover-url:http://pw2.netcom.com/~wooger/ robot-details-url:http://pw2.netcom.com/~wooger/ robot-owner-name:Jerry Walsh robot-owner-url:http://pw2.netcom.com/~wooger/ robot-owner-email:wooger@ix.netcom.com robot-status:active robot-purpose:Resource Gathering / Server Benchmarking robot-type:standalone application robot-platform:Intel - windows95 robot-availability:source, binary robot-exclusion:no robot-exclusion-useragent:WebBandit/1.0 robot-noindex:no robot-host:ix.netcom.com robot-from:no robot-useragent:WebBandit/1.0 robot-language:C++ robot-description:multithreaded, hyperlink-following, resource finding webspider robot-history:Inspired by reading of Internet Programming book by Jamsa/Cope robot-environment:commercial modified-date:11/21/96 modified-by:Jerry Walsh robot-name: WebCatcher robot-cover-url: http://oscar.lang.nagoya-u.ac.jp robot-details-url: robot-owner-name: Reiji SUZUKI robot-owner-url: http://oscar.lang.nagoya-u.ac.jp/~reiji/index.html robot-owner-email: reiji@infonia.ne.jp robot-owner-name2: Masatoshi SUGIURA robot-owner-url2: http://oscar.lang.nagoya-u.ac.jp/~sugiura/index.html robot-owner-email2: sugiura@lang.nagoya-u.ac.jp robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix, windows, mac robot-availability: none robot-exclusion: yes robot-exclusion-useragent: webcatcher robot-noindex: no robot-host: oscar.lang.nagoya-u.ac.jp robot-from: no robot-useragent: WebCatcher/1.0 robot-language: perl5 robot-description: WebCatcher gathers web pages that Japanese collage students want to visit. robot-history: This robot finds its roots in a research project at Nagoya University in 1998. robot-environment: research modified-date: Fri, 16 Oct 1998 17:28:52 JST modified-by: "Reiji SUZUKI" <reiji@infonia.ne.jp> robot-name: WebCopy robot-cover-url: http://www.inf.utfsm.cl/~vparada/webcopy.html robot-details-url: robot-owner-name: Victor Parada robot-owner-url: http://www.inf.utfsm.cl/~vparada/ robot-owner-email: vparada@inf.utfsm.cl robot-status: robot-purpose: mirroring robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: no robot-useragent: WebCopy/(version) robot-language: perl 4 or perl 5 robot-description: Its purpose is to perform mirroring. WebCopy can retrieve files recursively using HTTP protocol.It can be used as a delayed browser or as a mirroring tool. It cannot jump from one site to another. robot-history: robot-environment: modified-date: Sun Jul 2 15:27:04 1995 modified-by: robot-name: webfetcher robot-cover-url: http://www.ontv.com/ robot-details-url: robot-owner-name: robot-owner-url: http://www.ontv.com/ robot-owner-email: webfetch@ontv.com robot-status: robot-purpose: mirroring robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: * robot-from: yes robot-useragent: WebFetcher/0.8, robot-language: C++ robot-description: don't wait! OnTV's WebFetcher mirrors whole sites down to your hard disk on a TV-like schedule. Catch w3 documentation. Catch discovery.com without waiting! A fully operational web robot for NT/95 today, most UNIX soon, MAC tomorrow. robot-history: robot-environment: modified-date: Sat Jan 27 10:31:43 1996. modified-by: robot-name: The Webfoot Robot robot-cover-url: robot-details-url: robot-owner-name: Lee McLoughlin robot-owner-url: http://web.doc.ic.ac.uk/f?/lmjm robot-owner-email: L.McLoughlin@doc.ic.ac.uk robot-status: robot-purpose: robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: phoenix.doc.ic.ac.uk robot-from: robot-useragent: robot-language: robot-description: robot-history: First spotted in Mid February 1994 robot-environment: modified-date: modified-by: robot-name: Webinator robot-details-url: http://www.thunderstone.com/texis/site/pages/webinator4_admin.html robot-cover-url: http://www.thunderstone.com/texis/site/pages/webinator.html robot-owner-name: robot-owner-email: robot-status: active, under further enhancement. robot-purpose: information retrieval robot-type: standalone robot-exclusion: yes robot-noindex: yes robot-exclusion-useragent: T-H-U-N-D-E-R-S-T-O-N-E robot-host: several robot-from: No robot-language: Texis Vortex robot-history: robot-environment: Commercial robot-name: weblayers robot-cover-url: http://www.univ-paris8.fr/~loic/weblayers/ robot-details-url: robot-owner-name: Loic Dachary robot-owner-url: http://www.univ-paris8.fr/~loic/ robot-owner-email: loic@afp.com robot-status: robot-purpose: maintainance robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: robot-useragent: weblayers/0.0 robot-language: perl 5 robot-description: Its purpose is to validate, cache and maintain links. It is designed to maintain the cache generated by the emacs emacs w3 mode (N*tscape replacement) and to support annotated documents (keep them in sync with the original document via diff/patch). robot-history: robot-environment: modified-date: Fri Jun 23 16:30:42 FRE 1995 modified-by: robot-name: WebLinker robot-cover-url: http://www.cern.ch/WebLinker/ robot-details-url: robot-owner-name: James Casey robot-owner-url: http://www.maths.tcd.ie/hyplan/jcasey/jcasey.html robot-owner-email: jcasey@maths.tcd.ie robot-status: robot-purpose: maintenance robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: robot-from: robot-useragent: WebLinker/0.0 libwww-perl/0.1 robot-language: robot-description: it traverses a section of web, doing URN->URL conversion. It will be used as a post-processing tool on documents created by automatic converters such as LaTeX2HTML or WebMaker. At the moment it works at full speed, but is restricted to localsites. External GETs will be added, but these will be running slowly. WebLinker is meant to be run locally, so if you see it elsewhere let the author know! robot-history: robot-environment: modified-date: modified-by: robot-name: WebMirror robot-cover-url: http://www.winsite.com/pc/win95/netutil/wbmiror1.zip robot-details-url: robot-owner-name: Sui Fung Chan robot-owner-url: http://www.geocities.com/NapaVally/1208 robot-owner-email: sfchan@mailhost.net robot-status: robot-purpose: mirroring robot-type: standalone robot-platform: Windows95 robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: robot-from: no robot-useragent: no robot-language: C++ robot-description: It download web pages to hard drive for off-line browsing. robot-history: robot-environment: modified-date: Mon Apr 29 08:52:25 1996. modified-by: robot-name: The Web Moose robot-cover-url: robot-details-url: http://www.nwlink.com/~mikeblas/webmoose/ robot-owner-name: Mike Blaszczak robot-owner-url: http://www.nwlink.com/~mikeblas/ robot-owner-email: mikeblas@nwlink.com robot-status: development robot-purpose: statistics, maintenance robot-type: standalone robot-platform: Windows NT robot-availability: data robot-exclusion: no robot-exclusion-useragent: WebMoose robot-noindex: no robot-host: msn.com robot-from: no robot-useragent: WebMoose/0.0.0000 robot-language: C++ robot-description: This robot collects statistics and verifies links. It builds an graph of its visit path. robot-history: This robot is under development. It will support ROBOTS.TXT soon. robot-environment: hobby modified-date: Fri, 30 Aug 1996 00:00:00 GMT modified-by: Mike Blaszczak robot-name:WebQuest robot-cover-url: robot-details-url: robot-owner-name:TaeYoung Choi robot-owner-url:http://www.cosmocyber.co.kr:8080/~cty/index.html robot-owner-email:cty@cosmonet.co.kr robot-status:development robot-purpose:indexing robot-type:standalone robot-platform:unix robot-availability:none robot-exclusion:yes robot-exclusion-useragent:webquest robot-noindex:no robot-host:210.121.146.2, 210.113.104.1, 210.113.104.2 robot-from:yes robot-useragent:WebQuest/1.0 robot-language:perl5 robot-description:WebQuest will be used to build the databases for various web search service sites which will be in service by early 1998. Until the end of Jan. 1998, WebQuest will run from time to time. Since then, it will run daily(for few hours and very slowly). robot-history:The developent of WebQuest was motivated by the need for a customized robot in various projects of COSMO Information & Communication Co., Ltd. in Korea. robot-environment:service modified-date:Tue, 30 Dec 1997 09:27:20 GMT modified-by:TaeYoung Choi robot-name: Digimarc MarcSpider robot-cover-url: http://www.digimarc.com/prod_fam.html robot-details-url: http://www.digimarc.com/prod_fam.html robot-owner-name: Digimarc Corporation robot-owner-url: http://www.digimarc.com robot-owner-email: wmreader@digimarc.com robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: 206.102.3.* robot-from: yes robot-useragent: Digimarc WebReader/1.2 robot-language: c++ robot-description: Examines image files for watermarks. In order to not waste internet bandwidth with yet another crawler, we have contracted with one of the major crawlers/seach engines to provide us with a list of specific URLs of interest to us. If an URL is to an image, we may read the image, but we do not crawl to any other URLs. If an URL is to a page of interest (ususally due to CGI), then we access the page to get the image URLs from it, but we do not crawl to any other pages. robot-history: First operation in August 1997. robot-environment: service modified-date: Mon, 20 Oct 1997 16:44:29 GMT modified-by: Brian MacIntosh robot-name: WebReaper robot-cover-url: http://www.otway.com/webreaper robot-details-url: robot-owner-name: Mark Otway robot-owner-url: http://www.otway.com robot-owner-email: webreaper@otway.com robot-status: active robot-purpose: indexing/offline browsing robot-type: standalone robot-platform: windows95, windowsNT robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: webreaper robot-noindex: no robot-host: * robot-from: no robot-useragent: WebReaper [webreaper@otway.com] robot-language: c++ robot-description: Freeware app which downloads and saves sites locally for offline browsing. robot-history: Written for personal use, and then distributed to the public as freeware. robot-environment: hobby modified-date: Thu, 25 Mar 1999 15:00:00 GMT modified-by: Mark Otway robot-name: webs robot-cover-url: http://webdew.rnet.or.jp/ robot-details-url: http://webdew.rnet.or.jp/service/shank/NAVI/SEARCH/info2.html#robot robot-owner-name: Recruit Co.Ltd, robot-owner-url: robot-owner-email: dew@wwwadmin.rnet.or.jp robot-status: active robot-purpose: statistics robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion |