

|
Web Robots Indexrobot-name: Internet Shinchakubin robot-cover-url: http://naragw.sharp.co.jp/myweb/home/ robot-details-url: robot-owner-name: SHARP Corp. robot-owner-url: http://naragw.sharp.co.jp/myweb/home/ robot-owner-email: shinchakubin-request@isl.nara.sharp.co.jp robot-status: active robot-purpose: find new links and changed pages robot-type: standalone robot-platform: Windows98 robot-availability: binary as bundled software robot-exclusion: yes robot-exclusion-useragent: sharp-info-agent robot-noindex: no robot-host: * robot-from: no robot-useragent: User-Agent: Mozilla/4.0 (compatible; sharp-info-agent v1.0; ) robot-language: Java robot-description: makes a list of new links and changed pages based on user's frequently clicked pages in the past 31 days. client may run this software one or few times every day, manually or specified time. robot-history: shipped for SHARP's PC users since Feb 2000 robot-environment: commercial modified-date: Fri, 30 Jun 2000 19:02:52 JST modified-by: Katsuo Doi <doi@isl.nara.sharp.co.jp> robot-name: NetCarta WebMap Engine robot-cover-url: http://www.netcarta.com/ robot-details-url: robot-owner-name: NetCarta WebMap Engine robot-owner-url: http://www.netcarta.com/ robot-owner-email: info@netcarta.com robot-status: robot-purpose: indexing, maintenance, mirroring, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: NetCarta CyberPilot Pro robot-language: C++. robot-description: The NetCarta WebMap Engine is a general purpose, commercial spider. Packaged with a full GUI in the CyberPilo Pro product, it acts as a personal spider to work with a browser to facilitiate context-based navigation. The WebMapper product uses the robot to manage a site (site copy, site diff, and extensive link management facilities). All versions can create publishable NetCarta WebMaps, which capture the crawled information. If the robot sees a published map, it will return the published map rather than continuing its crawl. Since this is a personal spider, it will be launched from multiple domains. This robot tends to focus on a particular site. No instance of the robot should have more than one outstanding request out to any given site at a time. The User-agent field contains a coded ID identifying the instance of the spider; specific users can be blocked via robots.txt using this ID. robot-history: robot-environment: modified-date: Sun Feb 18 02:02:49 1996. modified-by: robot-name: NetMechanic robot-cover-url: http://www.netmechanic.com robot-details-url: http://www.netmechanic.com/faq.html robot-owner-name: Tom Dahm robot-owner-url: http://iquest.com/~tdahm robot-owner-email: tdahm@iquest.com robot-status: development robot-purpose: Link and HTML validation robot-type: standalone with web gateway robot-platform: UNIX robot-availability: via web page robot-exclusion: Yes robot-exclusion-useragent: WebMechanic robot-noindex: no robot-host: 206.26.168.18 robot-from: no robot-useragent: NetMechanic robot-language: C robot-description: NetMechanic is a link validation and HTML validation robot run using a web page interface. robot-history: robot-environment: modified-date: Sat, 17 Aug 1996 12:00:00 GMT modified-by: robot-name: NetScoop robot-cover-url: http://www-a2k.is.tokushima-u.ac.jp/search/index.html robot-owner-name: Kenji Kita robot-owner-url: http://www-a2k.is.tokushima-u.ac.jp/member/kita/index.html robot-owner-email: kita@is.tokushima-u.ac.jp robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: UNIX robot-availability: none robot-exclusion: yes robot-exclusion-useragent: NetScoop robot-host: alpha.is.tokushima-u.ac.jp, beta.is.tokushima-u.ac.jp robot-useragent: NetScoop/1.0 libwww/5.0a robot-language: C robot-description: The NetScoop robot is used to build the database for the NetScoop search engine. robot-history: The robot has been used in the research project at the Faculty of Engineering, Tokushima University, Japan., since Dec. 1996. robot-environment: research modified-date: Fri, 10 Jan 1997. modified-by: Kenji Kita robot-name: newscan-online robot-cover-url: http://www.newscan-online.de/ robot-details-url: http://www.newscan-online.de/info.html robot-owner-name: Axel Mueller robot-owner-url: robot-owner-email: mueller@newscan-online.de robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Linux robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: newscan-online robot-noindex: no robot-host: *newscan-online.de robot-from: yes robot-useragent: newscan-online/1.1 robot-language: perl robot-description: The newscan-online robot is used to build a database for the newscan-online news search service operated by smart information services. The robot runs daily and visits predefined sites in a random order. robot-history: This robot finds its roots in a prereleased software for news filtering for Lotus Notes in 1995. robot-environment: service modified-date: Fri, 9 Apr 1999 11:45:00 GMT modified-by: Axel Mueller robot-name: NHSE Web Forager robot-cover-url: http://nhse.mcs.anl.gov/ robot-details-url: robot-owner-name: Robert Olson robot-owner-url: http://www.mcs.anl.gov/people/olson/ robot-owner-email: olson@mcs.anl.gov robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: *.mcs.anl.gov robot-from: yes robot-useragent: NHSEWalker/3.0 robot-language: perl 5 robot-description: to generate a Resource Discovery database robot-history: robot-environment: modified-date: Fri May 5 15:47:55 1995 modified-by: robot-name: Nomad robot-cover-url: http://www.cs.colostate.edu/~sonnen/projects/nomad.html robot-details-url: robot-owner-name: Richard Sonnen robot-owner-url: http://www.cs.colostate.edu/~sonnen/ robot-owner-email: sonnen@cs.colostat.edu robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: *.cs.colostate.edu robot-from: no robot-useragent: Nomad-V2.x robot-language: Perl 4 robot-description: robot-history: Developed in 1995 at Colorado State University. robot-environment: modified-date: Sat Jan 27 21:02:20 1996. modified-by: robot-name: The NorthStar Robot robot-cover-url: http://comics.scs.unr.edu:7000/top.html robot-details-url: robot-owner-name: Fred Barrie robot-owner-url: robot-owner-email: barrie@unr.edu robot-status: robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: frognot.utdallas.edu, utdallas.edu, cnidir.org robot-from: yes robot-useragent: NorthStar robot-language: robot-description: Recent runs (26 April 94) will concentrate on textual analysis of the Web versus GopherSpace (from the Veronica data) as well as indexing. robot-history: robot-environment: modified-date: modified-by: robot-name: Occam robot-cover-url: http://www.cs.washington.edu/research/projects/ai/www/occam/ robot-details-url: robot-owner-name: Marc Friedman robot-owner-url: http://www.cs.washington.edu/homes/friedman/ robot-owner-email: friedman@cs.washington.edu robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Occam robot-noindex: no robot-host: gentian.cs.washington.edu, sekiu.cs.washington.edu, saxifrage.cs.washington.edu robot-from: yes robot-useragent: Occam/1.0 robot-language: CommonLisp, perl4 robot-description: The robot takes high-level queries, breaks them down into multiple web requests, and answers them by combining disparate data gathered in one minute from numerous web sites, or from the robots cache. Currently the only user is me. robot-history: The robot is a descendant of Rodney, an earlier project at the University of Washington. robot-environment: research modified-date: Thu, 21 Nov 1996 20:30 GMT modified-by: friedman@cs.washington.edu (Marc Friedman) robot-name: HKU WWW Octopus robot-cover-url: http://phoenix.cs.hku.hk:1234/~jax/w3rui.shtml robot-details-url: robot-owner-name: Law Kwok Tung , Lee Tak Yeung , Lo Chun Wing robot-owner-url: http://phoenix.cs.hku.hk:1234/~jax robot-owner-email: jax@cs.hku.hk robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: no. robot-exclusion-useragent: robot-noindex: robot-host: phoenix.cs.hku.hk robot-from: yes robot-useragent: HKU WWW Robot, robot-language: Perl 5, C, Java. robot-description: HKU Octopus is an ongoing project for resource discovery in the Hong Kong and China WWW domain . It is a research project conducted by three undergraduate at the University of Hong Kong robot-history: robot-environment: modified-date: Thu Mar 7 14:21:55 1996. modified-by: robot-name: Openfind data gatherer robot-cover-url: http://www.openfind.com.tw/ robot-details-url: http://www.openfind.com.tw/robot.html robot-owner-name: robot-owner-url: robot-owner-email: robot-response@openfind.com.tw robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: 66.7.131.132 robot-from: robot-useragent: Openfind data gatherer, Openbot/3.0+(robot-response@openfind.com.tw;+http://www.openfind.com.tw/robot.html) robot-language: robot-description: robot-history: robot-environment: modified-date: Thu, 26 Apr 2001 02:55:21 GMT modified-by: stanislav shalunov <shalunov@internet2.edu> robot-name: Orb Search robot-cover-url: http://orbsearch.home.ml.org robot-details-url: http://orbsearch.home.ml.org robot-owner-name: Matt Weber robot-owner-url: http://www.weberworld.com robot-owner-email: webernet@geocities.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: yes robot-exclusion-useragent: Orbsearch/1.0 robot-noindex: yes robot-host: cow.dyn.ml.org, *.dyn.ml.org robot-from: yes robot-useragent: Orbsearch/1.0 robot-language: Perl5 robot-description: Orbsearch builds the database for Orb Search Engine. It runs when requested. robot-history: This robot was started as a hobby. robot-environment: hobby modified-date: Sun, 31 Aug 1997 02:28:52 GMT modified-by: Matt Weber robot-name: Pack Rat robot-cover-url: http://web.cps.msu.edu/~dexterte/isl/packrat.html robot-details-url: robot-owner-name: Terry Dexter robot-owner-url: http://web.cps.msu.edu/~dexterte robot-owner-email: dexterte@cps.msu.edu robot-status: development robot-purpose: both maintenance and mirroring robot-type: standalone robot-platform: unix robot-availability: at the moment, none...source when developed. robot-exclusion: yes robot-exclusion-useragent: packrat or * robot-noindex: no, not yet robot-host: cps.msu.edu robot-from: robot-useragent: PackRat/1.0 robot-language: perl with libwww-5.0 robot-description: Used for local maintenance and for gathering web pages so that local statisistical info can be used in artificial intelligence programs. Funded by NEMOnline. robot-history: In the making... robot-environment: research modified-date: Tue, 20 Aug 1996 15:45:11 modified-by: Terry Dexter robot-name:PageBoy robot-cover-url:http://www.webdocs.org/ robot-details-url:http://www.webdocs.org/ robot-owner-name:Chihiro Kuroda robot-owner-url:http://www.webdocs.org/ robot-owner-email:pageboy@webdocs.org robot-status:development robot-purpose:indexing robot-type:standalone robot-platform:unix robot-availability:none robot-exclusion:yes robot-exclusion-useragent:pageboy robot-noindex:yes robot-nofollow:yes robot-host:*.webdocs.org robot-from:yes robot-useragent:PageBoy/1.0 robot-language:c robot-description:The robot visits at regular intervals. robot-history:none robot-environment:service modified-date:Fri, 21 Oct 1999 17:28:52 GMT modified-by:webdocs robot-name: ParaSite robot-cover-url: http://www.ianett.com/parasite/ robot-details-url: http://www.ianett.com/parasite/ robot-owner-name: iaNett.com robot-owner-url: http://www.ianett.com/ robot-owner-email: parasite@ianett.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: ParaSite robot-noindex: yes robot-nofollow: yes robot-host: *.ianett.com robot-from: yes robot-useragent: ParaSite/0.21 (http://www.ianett.com/parasite/) robot-language: c++ robot-description: Builds index for ianett.com search database. Runs continiously. robot-history: Second generation of ianett.com spidering technology, originally called Sven. robot-environment: service modified-date: July 28, 2000 modified-by: Marty Anstey robot-name: Patric robot-cover-url: http://www.nwnet.net/technical/ITR/index.html robot-details-url: http://www.nwnet.net/technical/ITR/index.html robot-owner-name: toney@nwnet.net robot-owner-url: http://www.nwnet.net/company/staff/toney robot-owner-email: webmaster@nwnet.net robot-status: development robot-purpose: statistics robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: yes robot-exclusion-useragent: patric robot-noindex: yes robot-host: *.nwnet.net robot-from: no robot-useragent: Patric/0.01a robot-language: perl robot-description: (contained at http://www.nwnet.net/technical/ITR/index.html ) robot-history: (contained at http://www.nwnet.net/technical/ITR/index.html ) robot-environment: service modified-date: Thurs, 15 Aug 1996 modified-by: toney@nwnet.net robot-name: pegasus robot-cover-url: http://opensource.or.id/projects.html robot-details-url: http://pegasus.opensource.or.id robot-owner-name: A.Y.Kiky Shannon robot-owner-url: http://go.to/ayks robot-owner-email: shannon@opensource.or.id robot-status: inactive - open source robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source, binary robot-exclusion: yes robot-exclusion-useragent: pegasus robot-noindex: yes robot-host: * robot-from: yes robot-useragent: web robot PEGASUS robot-language: perl5 robot-description: pegasus gathers information from HTML pages (7 important tags). The indexing process can be started based on starting URL(s) or a range of IP address. robot-history: This robot was created as an implementation of a final project on Informatics Engineering Department, Institute of Technology Bandung, Indonesia. robot-environment: research modified-date: Fri, 20 Oct 2000 14:58:40 GMT modified-by: A.Y.Kiky Shannon robot-name: The Peregrinator robot-cover-url: http://www.maths.usyd.edu.au:8000/jimr/pe/Peregrinator.html robot-details-url: robot-owner-name: Jim Richardson robot-owner-url: http://www.maths.usyd.edu.au:8000/jimr.html robot-owner-email: jimr@maths.su.oz.au robot-status: robot-purpose: robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: yes robot-useragent: Peregrinator-Mathematics/0.7 robot-language: perl 4 robot-description: This robot is being used to generate an index of documents on Web sites connected with mathematics and statistics. It ignores off-site links, so does not stray from a list of servers specified initially. robot-history: commenced operation in August 1994 robot-environment: modified-date: modified-by: robot-name: PerlCrawler 1.0 robot-cover-url: http://perlsearch.hypermart.net/ robot-details-url: http://www.xav.com/scripts/xavatoria/index.html robot-owner-name: Matt McKenzie robot-owner-url: http://perlsearch.hypermart.net/ robot-owner-email: webmaster@perlsearch.hypermart.net robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: perlcrawler robot-noindex: yes robot-host: server5.hypermart.net robot-from: yes robot-useragent: PerlCrawler/1.0 Xavatoria/2.0 robot-language: perl5 robot-description: The PerlCrawler robot is designed to index and build a database of pages relating to the Perl programming language. robot-history: Originated in modified form on 25 June 1998 robot-environment: hobby modified-date: Fri, 18 Dec 1998 23:37:40 GMT modified-by: Matt McKenzie robot-name: Phantom robot-cover-url: http://www.maxum.com/phantom/ robot-details-url: robot-owner-name: Larry Burke robot-owner-url: http://www.aktiv.com/ robot-owner-email: lburke@aktiv.com robot-status: robot-purpose: indexing robot-type: standalone robot-platform: Macintosh robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: Duppies robot-language: robot-description: Designed to allow webmasters to provide a searchable index of their own site as well as to other sites, perhaps with similar content. robot-history: robot-environment: modified-date: Fri Jan 19 05:08:15 1996. modified-by: robot-name: PhpDig robot-cover-url: http://phpdig.toiletoine.net/ robot-details-url: http://phpdig.toiletoine.net/ robot-owner-name: Antoine Bajolet robot-owner-url: http://phpdig.toiletoine.net/ robot-owner-email: phpdig@toiletoine.net robot-status: * robot-purpose: indexing robot-type: standalone robot-platform: all supported by Apache/php/mysql robot-availability: source robot-exclusion: yes robot-exclusion-useragent: phpdig robot-noindex: yes robot-host: yes robot-from: no robot-useragent: phpdig/x.x.x robot-language: php 4.x robot-description: Small robot and search engine written in php. robot-history: writen first 2001-03-30 robot-environment: hobby modified-date: Sun, 21 Nov 2001 20:01:19 GMT modified-by: Antoine Bajolet robot-name: PiltdownMan robot-cover-url: http://profitnet.bizland.com/ robot-details-url: http://profitnet.bizland.com/piltdownman.html robot-owner-name: Daniel Vilà robot-owner-url: http://profitnet.bizland.com/aboutus.html robot-owner-email: profitnet@myezmail.com robot-status: active robot-purpose: statistics robot-type: standalone robot-platform: windows95, windows98, windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: piltdownman robot-noindex: no robot-nofollow: no robot-host: 62.36.128.*, 194.133.59.*, 212.106.215.* robot-from: no robot-useragent: PiltdownMan/1.0 profitnet@myezmail.com robot-language: c++ robot-description: The PiltdownMan robot is used to get a list of links from the search engines in our database. These links are followed, and the page that they refer is downloaded to get some statistics from them. The robot runs once a month, more or less, and visits the first 10 pages listed in every search engine, for a group of keywords. robot-history: To maintain a database of search engines, we needed an automated tool. That's why we began the creation of this robot. robot-environment: service modified-date: Mon, 13 Dec 1999 21:50:32 GMT modified-by: Daniel Vilà robot-name: Pimptrain.com's robot robot-cover-url: http://www.pimptrain.com/search.cgi robot-details-url: http://www.pimptrain.com/search.cgi robot-owner-name: Bryan Ankielewicz robot-owner-url: http://www.pimptrain.com robot-owner-email: webmaster@pimptrain.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source;data robot-exclusion: yes robot-exclusion-useragent: Pimptrain robot-noindex: yes robot-host: pimtprain.com robot-from: * robot-useragent: Mozilla/4.0 (compatible: Pimptrain's robot) robot-language: perl5 robot-description: Crawls remote sites as part of a search engine program robot-history: Implemented in 2001 robot-environment: commercial modified-date: May 11, 2001 modified-by: Bryan Ankielewicz robot-name: Pioneer robot-cover-url: http://sequent.uncfsu.edu/~micah/pioneer.html robot-details-url: robot-owner-name: Micah A. Williams robot-owner-url: http://sequent.uncfsu.edu/~micah/ robot-owner-email: micah@sequent.uncfsu.edu robot-status: robot-purpose: indexing, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: *.uncfsu.edu or flyer.ncsc.org robot-from: yes robot-useragent: Pioneer robot-language: C. robot-description: Pioneer is part of an undergraduate research project. robot-history: robot-environment: modified-date: Mon Feb 5 02:49:32 1996. modified-by: robot-name: html_analyzer robot-cover-url: robot-details-url: robot-owner-name: James E. Pitkow robot-owner-url: robot-owner-email: pitkow@aries.colorado.edu robot-status: robot-purpose: maintainance robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: robot-useragent: robot-language: robot-description: to check validity of Web servers. I'm not sure if it has ever been run remotely. robot-history: robot-environment: modified-date: modified-by: robot-name: Portal Juice Spider robot-cover-url: http://www.portaljuice.com robot-details-url: http://www.portaljuice.com/pjspider.html robot-owner-name: Nextopia Software Corporation robot-owner-url: http://www.portaljuice.com robot-owner-email: pjspider@portaljuice.com robot-status: active robot-purpose: indexing, statistics robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: pjspider robot-noindex: yes robot-host: *.portaljuice.com, *.nextopia.com robot-from: yes robot-useragent: PortalJuice.com/4.0 robot-language: C/C++ robot-description: Indexing web documents for Portal Juice vertical portal search engine robot-history: Indexing the web since 1998 for the purposes of offering our commerical Portal Juice search engine services. robot-environment: service modified-date: Wed Jun 23 17:00:00 EST 1999 modified-by: pjspider@portaljuice.com robot-name: PGP Key Agent robot-cover-url: http://www.starnet.it/pgp robot-details-url: robot-owner-name: Massimiliano Pucciarelli robot-owner-url: http://www.starnet.it/puma robot-owner-email: puma@comm2000.it robot-status: Active robot-purpose: indexing robot-type: standalone robot-platform: UNIX, Windows NT robot-availability: none robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: salerno.starnet.it robot-from: yes robot-useragent: PGP-KA/1.2 robot-language: Perl 5 robot-description: This program search the pgp public key for the specified user. robot-history: Originated as a research project at Salerno University in 1995. robot-environment: Research modified-date: June 27 1996. modified-by: Massimiliano Pucciarelli robot-name: PlumtreeWebAccessor robot-cover-url: robot-details-url: http://www.plumtree.com/ robot-owner-name: Joseph A. Stanko robot-owner-url: robot-owner-email: josephs@plumtree.com robot-status: development robot-purpose: indexing for the Plumtree Server robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: PlumtreeWebAccessor robot-noindex: yes robot-host: robot-from: yes robot-useragent: PlumtreeWebAccessor/0.9 robot-language: c++ robot-description: The Plumtree Web Accessor is a component that customers can add to the Plumtree Server to index documents on the World Wide Web. robot-history: robot-environment: commercial modified-date: Thu, 17 Dec 1998 modified-by: Joseph A. Stanko <josephs@plumtree.com> robot-name: Poppi robot-cover-url: http://members.tripod.com/poppisearch robot-details-url: http://members.tripod.com/poppisearch robot-owner-name: Antonio Provenzano robot-owner-url: Antonio Provenzano robot-owner-email: robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix/linux robot-availability: none robot-exclusion: robot-exclusion-useragent: robot-noindex: yes robot-host:=20 robot-from: robot-useragent: Poppi/1.0 robot-language: C robot-description: Poppi is a crawler to index the web that runs weekly gathering and indexing hypertextual, multimedia and executable file formats robot-history: Created by Antonio Provenzano in the april of 2000, has been acquired from Tomi Officine Multimediali srl and it is next to release as service and commercial robot-environment: service modified-date: Mon, 22 May 2000 15:47:30 GMT modified-by: Antonio Provenzano robot-name: PortalB Spider robot-cover-url: http://www.portalb.com/ robot-details-url: robot-owner-name: PortalB Spider Bug List robot-owner-url: robot-owner-email: spider@portalb.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: PortalBSpider robot-noindex: yes robot-nofollow: yes robot-host: spider1.portalb.com, spider2.portalb.com, etc. robot-from: no robot-useragent: PortalBSpider/1.0 (spider@portalb.com) robot-language: C++ robot-description: The PortalB Spider indexes selected sites for high-quality business information. robot-history: robot-environment: service robot-name: psbot robot-cover-url: http://www.picsearch.com/ robot-details-url: http://www.picsearch.com/bot.html robot-owner-name: picsearch AB robot-owner-url: http://www.picsearch.com/ robot-owner-email: psbot@picsearch.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: psbot robot-noindex: yes robot-nofollow: yes robot-host: *.picsearch.com robot-from: yes robot-useragent: psbot/0.X (+http://www.picsearch.com/bot.html) robot-language: c, c++ robot-description: Spider for www.picsearch.com robot-history: Developed and tested in 2000/2001 robot-environment: commercial modified-date: Tue, 21 Aug 2001 10:55:38 CEST 2001 modified-by: psbot@picsearch.com robot-name: GetterroboPlus Puu robot-details-url: http://marunaka.homing.net/straight/getter/ robot-cover-url: http://marunaka.homing.net/straight/ robot-owner-name: marunaka robot-owner-url: http://marunaka.homing.net robot-owner-email: marunaka@homing.net robot-status: active: robot actively in use robot-purpose: Purpose of the robot. One or more of: - gathering: gather data of original standerd TAG for Puu contains the information of the sites registered my Search Engin. - maintenance: link validation robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes (Puu patrols only registered url in my Search Engine) robot-exclusion-useragent: Getterrobo-Plus robot-noindex: no robot-host: straight FLASH!! Getterrobo-Plus, *.homing.net robot-from: yes robot-useragent: straight FLASH!! GetterroboPlus 1.5 robot-language: perl5 robot-description: Puu robot is used to gater data from registered site in Search Engin "straight FLASH!!" for building anouncement page of state of renewal of registered site in "straight FLASH!!". Robot runs everyday. robot-history: This robot patorols based registered sites in Search Engin "straight FLASH!!" robot-environment: hobby modified-date: Fri, 26 Jun 1998 robot-name: The Python Robot robot-cover-url: http://www.python.org/ robot-details-url: robot-owner-name: Guido van Rossum robot-owner-url: http://www.python.org/~guido/ robot-owner-email: guido@python.org robot-status: retired robot-purpose: robot-type: robot-platform: robot-availability: none robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: robot-useragent: robot-language: robot-description: robot-history: robot-environment: modified-date: modified-by: robot-name: Raven Search robot-cover-url: http://ravensearch.tripod.com robot-details-url: http://ravensearch.tripod.com robot-owner-name: Raven Group robot-owner-url: http://ravensearch.tripod.com robot-owner-email: ravensearch@hotmail.com robot-status: Development: robot under development robot-purpose: Indexing: gather content for commercial query engine. robot-type: Standalone: a separate program robot-platform: Unix, Windows98, WindowsNT, Windows2000 robot-availability: None robot-exclusion: Yes robot-exclusion-useragent: Raven robot-noindex: Yes robot-nofollow: Yes robot-host: 192.168.1.* robot-from: Yes robot-useragent: Raven-v2 robot-language: Perl-5 robot-description: Raven was written for the express purpose of indexing the web. It can parallel process hundreds of URLS's at a time. It runs on a sporadic basis as testing continues. It is really several programs running concurrently. It takes four computers to run Raven Search. Scalable in sets of four. robot-history: This robot is new. First active on March 25, 2000. robot-environment: Commercial: is a commercial product. Possibly GNU later ;-) modified-date: Fri, 25 Mar 2000 17:28:52 GMT modified-by: Raven Group robot-name: RBSE Spider robot-cover-url: http://rbse.jsc.nasa.gov/eichmann/urlsearch.html robot-details-url: robot-owner-name: David Eichmann robot-owner-url: http://rbse.jsc.nasa.gov/eichmann/home.html robot-owner-email: eichmann@rbse.jsc.nasa.gov robot-status: active robot-purpose: indexing, statistics robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: rbse.jsc.nasa.gov (192.88.42.10) robot-from: robot-useragent: robot-language: C, oracle, wais robot-description: Developed and operated as part of the NASA-funded Repository Based Software Engineering Program at the Research Institute for Computing and Information Systems, University of Houston - Clear Lake. robot-history: robot-environment: modified-date: Thu May 18 04:47:02 1995 modified-by: robot-name: Resume Robot robot-cover-url: http://www.onramp.net/proquest/resume/robot/robot.html robot-details-url: robot-owner-name: James Stakelum robot-owner-url: http://www.onramp.net/proquest/resume/java/resume.html robot-owner-email: proquest@onramp.net robot-status: robot-purpose: indexing. robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: Resume Robot robot-language: C++. robot-description: robot-history: robot-environment: modified-date: Tue Mar 12 15:52:25 1996. modified-by: robot-name: RoadHouse Crawling System robot-cover-url: http://stage.perceval.be (under developpement) robot-details-url: robot-owner-name: Gregoire Welraeds, Emmanuel Bergmans robot-owner-url: http://www.perceval.be robot-owner-email: helpdesk@perceval.be robot-status: development robot-purpose1: indexing robot-purpose2: maintenance robot-purpose3: statistics robot-type: standalone robot-platform1: unix (FreeBSD & Linux) robot-availability: none robot-exclusion: no (under development) robot-exclusion-useragent: RHCS robot-noindex: no (under development) robot-host: stage.perceval.be robot-from: no robot-useragent: RHCS/1.0a robot-language: c robot-description: robot used tp build the database for the RoadHouse search service project operated by Perceval robot-history: The need of this robot find its roots in the actual RoadHouse directory not maintenained since 1997 robot-environment: service modified-date: Fri, 26 Feb 1999 12:00:00 GMT modified-by: Gregoire Welraeds robot-name: Road Runner: The ImageScape Robot robot-owner-name: LIM Group robot-owner-email: lim@cs.leidenuniv.nl robot-status: development/active robot-purpose: indexing robot-type: standalone robot-platform: UNIX robot-exclusion: yes robot-exclusion-useragent: roadrunner robot-useragent: Road Runner: ImageScape Robot (lim@cs.leidenuniv.nl) robot-language: C, perl5 robot-description: Create Image/Text index for WWW robot-history: ImageScape Project robot-environment: commercial service modified-date: Dec. 1st, 1996 robot-name: Robbie the Robot robot-cover-url: robot-details-url: robot-owner-name: Robert H. Pollack robot-owner-url: robot-owner-email: robert.h.pollack@lmco.com robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix, windows95, windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Robbie robot-noindex: no robot-host: *.lmco.com robot-from: yes robot-useragent: Robbie/0.1 robot-language: java robot-description: Used to define document collections for the DISCO system. Robbie is still under development and runs several times a day, but usually only for ten minutes or so. Sites are visited in the order in which references are found, but no host is visited more than once in any two-minute period. robot-history: The DISCO system is a resource-discovery component in the OLLA system, which is a prototype system, developed under DARPA funding, to support computer-based education and training. robot-environment: research modified-date: Wed, 5 Feb 1997 19:00:00 GMT modified-by:
|
| |||||||||||||||||||||||||||||