

|
Spiders Indexrobot-name: iajaBot robot-cover-url: robot-details-url: http://www.scs.carleton.ca/~morin/iajabot.html robot-owner-name: Pat Morin robot-owner-url: http://www.scs.carleton.ca/~morin/ robot-owner-email: morin@scs.carleton.ca robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix, windows robot-availability: none robot-exclusion: no robot-exclusion-useragent: iajabot robot-noindex: no robot-host: *.scs.carleton.ca robot-from: no robot-useragent: iajaBot/0.1 robot-language: c robot-description: Finds adult content robot-history: None, brand new. robot-environment: research modified-date: Tue, 27 Jun 2000, 11:17:50 EDT modified-by: Pat Morin robot-name: IBM_Planetwide robot-cover-url: http://www.ibm.com/%7ewebmaster/ robot-details-url: robot-owner-name: Ed Costello robot-owner-url: http://www.ibm.com/%7ewebmaster/ robot-owner-email: epc@www.ibm.com" robot-status: robot-purpose: indexing, maintenance, mirroring robot-type: standalone and robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: www.ibm.com www2.ibm.com robot-from: yes robot-useragent: IBM_Planetwide, robot-language: Perl5 robot-description: Restricted to IBM owned or related domains. robot-history: robot-environment: modified-date: Mon Jan 22 22:09:19 1996. modified-by: robot-name: Popular Iconoclast robot-cover-url: http://gestalt.sewanee.edu/ic/ robot-details-url: http://gestalt.sewanee.edu/ic/info.html robot-owner-name: Chris Cappuccio robot-owner-url: http://sefl.satelnet.org/~ccappuc/ robot-owner-email: chris@gestalt.sewanee.edu robot-status: development robot-purpose: statistics robot-type: standalone robot-platform: unix (OpenBSD) robot-availability: source robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: gestalt.sewanee.edu robot-from: yes robot-useragent: gestaltIconoclast/1.0 libwww-FM/2.17 robot-language: c,perl5 robot-description: This guy likes statistics robot-history: This robot has a history in mathematics and english robot-environment: research modified-date: Wed, 5 Mar 1997 17:35:16 CST modified-by: chris@gestalt.sewanee.edu robot-name: Ingrid robot-cover-url: robot-details-url: robot-owner-name: Ilse c.v. robot-owner-url: http://www.ilse.nl/ robot-owner-email: ilse@ilse.nl robot-status: Running robot-purpose: Indexing robot-type: Web Indexer robot-platform: UNIX robot-availability: Commercial as part of search engine package robot-exclusion: Yes robot-exclusion-useragent: INGRID/0.1 robot-noindex: Yes robot-host: bart.ilse.nl robot-from: Yes robot-useragent: INGRID/0.1 robot-language: C robot-description: robot-history: robot-environment: modified-date: 06/13/1997 modified-by: Ilse robot-name: Imagelock robot-cover-url: robot-details-url: robot-owner-name: Ken Belanger robot-owner-url: robot-owner-email: belanger@imagelock.com robot-status: development robot-purpose: maintenance robot-type: robot-platform: windows95 robot-availability: none robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: 209.111.133.* robot-from: no robot-useragent: Mozilla 3.01 PBWF (Win95) robot-language: robot-description: searches for image links robot-history: robot-environment: service modified-date: Tue, 11 Aug 1998 17:28:52 GMT modified-by: brian@smithrenaud.com robot-name: IncyWincy robot-cover-url: http://osiris.sunderland.ac.uk/sst-scripts/simon.html robot-details-url: robot-owner-name: Simon Stobart robot-owner-url: http://osiris.sunderland.ac.uk/sst-scripts/simon.html robot-owner-email: simon.stobart@sunderland.ac.uk robot-status: robot-purpose: robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: osiris.sunderland.ac.uk robot-from: yes robot-useragent: IncyWincy/1.0b1 robot-language: C++ robot-description: Various Research projects at the University of Sunderland robot-history: robot-environment: modified-date: Fri Jan 19 21:50:32 1996. modified-by: robot-name: Informant robot-cover-url: http://informant.dartmouth.edu/ robot-details-url: http://informant.dartmouth.edu/about.html robot-owner-name: Bob Gray robot-owner-name2: Aditya Bhasin robot-owner-name3: Katsuhiro Moizumi robot-owner-name4: Dr. George V. Cybenko robot-owner-url: http://informant.dartmouth.edu/ robot-owner-email: info_adm@cosmo.dartmouth.edu robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: no robot-exclusion-useragent: Informant robot-noindex: no robot-host: informant.dartmouth.edu robot-from: yes robot-useragent: Informant robot-language: c, c++ robot-description: The Informant robot continually checks the Web pages that are relevant to user queries. Users are notified of any new or updated pages. The robot runs daily, but the number of hits per site per day should be quite small, and these hits should be randomly distributed over several hours. Since the robot does not actually follow links (aside from those returned from the major search engines such as Lycos), it does not fall victim to the common looping problems. The robot will support the Robot Exclusion Standard by early December, 1996. robot-history: The robot is part of a research project at Dartmouth College. The robot may become part of a commercial service (at which time it may be subsumed by some other, existing robot). robot-environment: research, service modified-date: Sun, 3 Nov 1996 11:55:00 GMT modified-by: Bob Gray robot-name: InfoSeek Robot 1.0 robot-cover-url: http://www.infoseek.com robot-details-url: robot-owner-name: Steve Kirsch robot-owner-url: http://www.infoseek.com robot-owner-email: stk@infoseek.com robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: corp-gw.infoseek.com robot-from: yes robot-useragent: InfoSeek Robot 1.0 robot-language: python robot-description: Its purpose is to generate a Resource Discovery database. Collects WWW pages for both InfoSeek's free WWW search and commercial search. Uses a unique proprietary algorithm to identify the most popular and interesting WWW pages. Very fast, but never has more than one request per site outstanding at any given time. Has been refined for more than a year. robot-history: robot-environment: modified-date: Sun May 28 01:35:48 1995 modified-by: robot-name: Infoseek Sidewinder robot-cover-url: http://www.infoseek.com/ robot-details-url: robot-owner-name: Mike Agostino robot-owner-url: http://www.infoseek.com/ robot-owner-email: mna@infoseek.com robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: Infoseek Sidewinder robot-language: C Collects WWW pages for both InfoSeek's free WWW search services. Uses a unique, incremental, very fast proprietary algorithm to find WWW pages. robot-description: robot-history: robot-environment: modified-date: Sat Apr 27 01:20:15 1996. modified-by: robot-name: InfoSpiders robot-cover-url: http://www-cse.ucsd.edu/users/fil/agents/agents.html robot-owner-name: Filippo Menczer robot-owner-url: http://www-cse.ucsd.edu/users/fil/ robot-owner-email: fil@cs.ucsd.edu robot-status: development robot-purpose: search robot-type: standalone robot-platform: unix, mac robot-availability: none robot-exclusion: yes robot-exclusion-useragent: InfoSpiders robot-noindex: no robot-host: *.ucsd.edu robot-from: yes robot-useragent: InfoSpiders/0.1 robot-language: c, perl5 robot-description: application of artificial life algorithm to adaptive distributed information retrieval robot-history: UC San Diego, Computer Science Dept. PhD research project (1995-97) under supervision of Prof. Rik Belew robot-environment: research modified-date: Mon, 16 Sep 1996 14:08:00 PDT robot-name: Inspector Web robot-cover-url: http://www.greenpac.com/inspector/ robot-details-url: http://www.greenpac.com/inspector/ourrobot.html robot-owner-name: Doug Green robot-owner-url: http://www.greenpac.com robot-owner-email: doug@greenpac.com robot-status: active: robot significantly developed, but still undergoing fixes robot-purpose: maintentance: link validation, html validation, image size validation, etc robot-type: standalone robot-platform: unix robot-availability: free service and more extensive commercial service robot-exclusion: yes robot-exclusion-useragent: inspectorwww robot-noindex: no robot-host: www.corpsite.com, www.greenpac.com, 38.234.171.* robot-from: yes robot-useragent: inspectorwww/1.0 http://www.greenpac.com/inspectorwww.html robot-language: c robot-description: Provide inspection reports which give advise to WWW site owners on missing links, images resize problems, syntax errors, etc. robot-history: development started in Mar 1997 robot-environment: commercial modified-date: Tue Jun 17 09:24:58 EST 1997 modified-by: Doug Green robot-name: IntelliAgent robot-cover-url: http://www.geocities.com/SiliconValley/3086/iagent.html robot-details-url: robot-owner-name: David Reilly robot-owner-url: http://www.geocities.com/SiliconValley/3086/index.html robot-owner-email: s1523@sand.it.bond.edu.au robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: sand.it.bond.edu.au robot-from: no robot-useragent: 'IAGENT/1.0' robot-language: C robot-description: IntelliAgent is still in development. Indeed, it is very far from completion. I'm planning to limit the depth at which it will probe, so hopefully IAgent won't cause anyone much of a problem. At the end of its completion, I hope to publish both the raw data and original source code. robot-history: robot-environment: modified-date: Fri May 31 02:10:39 1996. modified-by: robot-name: I, Robot robot-cover-url: http://irobot.mame.dk/ robot-details-url: http://irobot.mame.dk/about.phtml robot-owner-name: [mame.dk] robot-owner-url: http://www.mame.dk/ robot-owner-email: irobot@chaos.dk robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: irobot robot-noindex: yes robot-host: *.mame.dk, 206.161.121.* robot-from: no robot-useragent: I Robot 0.4 (irobot@chaos.dk) robot-language: c robot-description: I Robot is used to build a fresh database for the emulation community. Primary focus is information on emulation and especially old arcade machines. Primarily english sites will be indexed and only if they have their own domain. Sites are added manually on based on submitions after they has been evaluated. robot-history: The robot was started in june 2000 robot-environment1: service robot-environment2: hobby modified-date: Fri, 27 Oct 2000 09:08:06 GMT modified-by: BombJack mameadm@chaos.dk robot-name:Iron33 robot-cover-url:http://verno.ueda.info.waseda.ac.jp/iron33/ robot-details-url:http://verno.ueda.info.waseda.ac.jp/iron33/history.html robot-owner-name:Takashi Watanabe robot-owner-url:http://www.ueda.info.waseda.ac.jp/~watanabe/ robot-owner-email:watanabe@ueda.info.waseda.ac.jp robot-status:active robot-purpose:indexing, statistics robot-type:standalone robot-platform:unix robot-availability:source robot-exclusion:yes robot-exclusion-useragent:Iron33 robot-noindex:no robot-host:*.folon.ueda.info.waseda.ac.jp, 133.9.215.* robot-from:yes robot-useragent:Iron33/0.0 robot-language:c robot-description:The robot "Iron33" is used to build the database for the WWW search engine "Verno". robot-history: robot-environment:research modified-date:Fri, 20 Mar 1998 18:34 JST modified-by:Watanabe Takashi robot-name: Israeli-search robot-cover-url: http://www.idc.ac.il/Sandbag/ robot-details-url: robot-owner-name: Etamar Laron robot-owner-url: http://www.xpert.com/~etamar/ robot-owner-email: etamar@xpert.co robot-status: robot-purpose: indexing. robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: dylan.ius.cs.cmu.edu robot-from: no robot-useragent: IsraeliSearch/1.0 robot-language: C A complete software designed to collect information in a distributed workload and supports context queries. Intended to be a complete updated resource for Israeli sites and information related to Israel or Israeli Society. robot-description: robot-history: robot-environment: modified-date: Tue Apr 23 19:23:55 1996. modified-by: robot-name: JavaBee robot-cover-url: http://www.javabee.com robot-details-url: robot-owner-name:ObjectBox robot-owner-url:http://www.objectbox.com/ robot-owner-email:info@objectbox.com robot-status:Active robot-purpose:Stealing Java Code robot-type:standalone robot-platform:Java robot-availability:binary robot-exclusion:no robot-exclusion-useragent: robot-noindex:no robot-host:* robot-from:no robot-useragent:JavaBee robot-language:Java robot-description:This robot is used to grab java applets and run them locally overriding the security implemented robot-history: robot-environment:commercial modified-date: modified-by: robot-name: JBot Java Web Robot robot-cover-url: http://www.matuschek.net/software/jbot robot-details-url: http://www.matuschek.net/software/jbot robot-owner-name: Daniel Matuschek robot-owner-url: http://www.matuschek.net robot-owner-email: daniel@matuschek.net robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: Java robot-availability: source robot-exclusion: yes robot-exclusion-useragent: JBot robot-noindex: no robot-host: * robot-from: - robot-useragent: JBot (but can be changed by the user) robot-language: Java robot-description: Java web crawler to download web sites robot-history: - robot-environment: hobby modified-date: Thu, 03 Jan 2000 16:00:00 GMT modified-by: Daniel Matuschek <daniel@matuschek.net> robot-name: JCrawler robot-cover-url: http://www.nihongo.org/jcrawler/ robot-details-url: robot-owner-name: Benjamin Franz robot-owner-url: http://www.nihongo.org/snowhare/ robot-owner-email: snowhare@netimages.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: jcrawler robot-noindex: yes robot-host: db.netimages.com robot-from: yes robot-useragent: JCrawler/0.2 robot-language: perl5 robot-description: JCrawler is currently used to build the Vietnam topic specific WWW index for VietGATE <URL:http://www.vietgate.net/>. It schedules visits randomly, but will not visit a site more than once every two minutes. It uses a subject matter relevance pruning algorithm to determine what pages to crawl and index and will not generally index pages with no Vietnam related content. Uses Unicode internally, and detects and converts several different Vietnamese character encodings. robot-history: robot-environment: service modified-date: Wed, 08 Oct 1997 00:09:52 GMT modified-by: Benjamin Franz robot-name: AskJeeves robot-cover-url: http://www.ask.com robot-details-url: robot-owner-name: Ask Jeeves, Inc. robot-owner-url: http://www.ask.com robot-owner-email: postmaster@ask.com robot-status: active robot-purpose: indexing, maintenance robot-type: standalone robot-platform: linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: "Teoma" or "Ask Jeeves" or "Jeeves" robot-noindex: Yes robot-host: ez*.directhit.com robot-from: No robot-useragent: Mozilla/2.0 (compatible; Ask Jeeves/Teoma) robot-language: c++ robot-description: Ask Jeeves / Teoma spider robot-history: Developed by Direct Hit Technologies which was aquired by Ask Jeeves in 2000. robot-environment: service modified-date: Fri Jan 17 15:20:08 EST 2003 modified-by: brucep@ask.com robot-name: JoBo Java Web Robot robot-cover-url: http://www.matuschek.net/software/jobo/ robot-details-url: http://www.matuschek.net/software/jobo/ robot-owner-name: Daniel Matuschek robot-owner-url: http://www.matuschek.net robot-owner-email: daniel@matuschek.net robot-status: active robot-purpose: downloading, mirroring, indexing robot-type: standalone robot-platform: unix, windows, os/2, mac robot-availability: source robot-exclusion: yes robot-exclusion-useragent: jobo robot-noindex: no robot-host: * robot-from: yes robot-useragent: JoBo (can be modified by the user) robot-language: java robot-description: JoBo is a web site download tool. The core web spider can be used for any purpose. robot-history: JoBo was developed as a simple download tool and became a full featured web spider during development robot-environment: hobby modified-date: Fri, 20 Apr 2001 17:00:00 GMT modified-by: Daniel Matuschek <daniel@matuschek.net> robot-name: Jobot robot-cover-url: http://www.micrognosis.com/~ajack/jobot/jobot.html robot-details-url: robot-owner-name: Adam Jack robot-owner-url: http://www.micrognosis.com/~ajack/index.html robot-owner-email: ajack@corp.micrognosis.com robot-status: inactive robot-purpose: standalone robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: supernova.micrognosis.com robot-from: yes robot-useragent: Jobot/0.1alpha libwww-perl/4.0 robot-language: perl 4 robot-description: Its purpose is to generate a Resource Discovery database. Intended to seek out sites of potential "career interest". Hence - Job Robot. robot-history: robot-environment: modified-date: Tue Jan 9 18:55:55 1996 modified-by: robot-name: JoeBot robot-cover-url: robot-details-url: robot-owner-name: Ray Waldin robot-owner-url: http://www.primenet.com/~rwaldin robot-owner-email: rwaldin@primenet.com robot-status: robot-purpose: robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: JoeBot/x.x, robot-language: java JoeBot is a generic web crawler implemented as a collection of Java classes which can be used in a variety of applications, including resource discovery, link validation, mirroring, etc. It currently limits itself to one visit per host per minute. robot-description: robot-history: robot-environment: modified-date: Sun May 19 08:13:06 1996. modified-by: robot-name: The Jubii Indexing Robot robot-cover-url: http://www.jubii.dk/robot/default.htm robot-details-url: robot-owner-name: Jakob Faarvang robot-owner-url: http://www.cybernet.dk/staff/jakob/ robot-owner-email: jakob@jubii.dk robot-status: robot-purpose: indexing, maintainance robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: any host in the cybernet.dk domain robot-from: yes robot-useragent: JubiiRobot/version# robot-language: visual basic 4.0 robot-description: Its purpose is to generate a Resource Discovery database, and validate links. Used for indexing the .dk top-level domain as well as other Danish sites for aDanish web database, as well as link validation. robot-history: Will be in constant operation from Spring 1996 robot-environment: modified-date: Sat Jan 6 20:58:44 1996 modified-by: robot-name: JumpStation robot-cover-url: http://js.stir.ac.uk/jsbin/jsii robot-details-url: robot-owner-name: Jonathon Fletcher robot-owner-url: http://www.stir.ac.uk/~jf1 robot-owner-email: j.fletcher@stirling.ac.uk robot-status: retired robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: *.stir.ac.uk robot-from: yes robot-useragent: jumpstation robot-language: perl, C, c++ robot-description: robot-history: Originated as a weekend project in 1993. robot-environment: modified-date: Tue May 16 00:57:42 1995. modified-by:
|
| |||||||||||||||||||||||||||||