

|
Robots Indexrobot-name: ABCdatos BotLink robot-cover-url: http://www.abcdatos.com/ robot-details-url: http://www.abcdatos.com/botlink/ robot-owner-name: ABCdatos robot-owner-url: http://www.abcdatos.com/ robot-owner-email: botlink+AEA-abcdatos.com robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: windows robot-availability: none robot-exclusion: no robot-exclusion-useragent: BotLink robot-noindex: no robot-host: 217.126.39.167 robot-from: no robot-useragent: ABCdatos BotLink/1.0.2 (test links) robot-language: basic robot-description: This robot is used to verify availability of the ABCdatos directory entries (http://www.abcdatos.com), checking HTTP HEAD. Robot runs twice a week. Under HTTP 5xx error responses or unable to connect, it repeats verification some hours later, verifiying if that was a temporary situation. robot-history: This robot was developed by ABCdatos team to help working in the directory maintenance. robot-environment: commercial modified-date: Thu, 29 May 2003 01:00:00 GMT modified-by: ABCdatos
robot-name: Acme.Spider robot-cover-url: http://www.acme.com/java/software/Acme.Spider.html robot-details-url: http://www.acme.com/java/software/Acme.Spider.html robot-owner-name: Jef Poskanzer - ACME Laboratories robot-owner-url: http://www.acme.com/ robot-owner-email: jef@acme.com robot-status: active robot-purpose: indexing maintenance statistics robot-type: standalone robot-platform: java robot-availability: source robot-exclusion: yes robot-exclusion-useragent: Due to a deficiency in Java it's not currently possible to set the User-Agent. robot-noindex: no robot-host: * robot-from: no robot-useragent: Due to a deficiency in Java it's not currently possible to set the User-Agent. robot-language: java robot-description: A Java utility class for writing your own robots. robot-history: robot-environment: modified-date: Wed, 04 Dec 1996 21:30:11 GMT modified-by: Jef Poskanzer robot-name: Ahoy! The Homepage Finder robot-cover-url: http://www.cs.washington.edu/research/ahoy/ robot-details-url: http://www.cs.washington.edu/research/ahoy/doc/home.html robot-owner-name: Marc Langheinrich robot-owner-url: http://www.cs.washington.edu/homes/marclang robot-owner-email: marclang@cs.washington.edu robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: UNIX robot-availability: none robot-exclusion: yes robot-exclusion-useragent: ahoy robot-noindex: no robot-host: cs.washington.edu robot-from: no robot-useragent: 'Ahoy! The Homepage Finder' robot-language: Perl 5 robot-description: Ahoy! is an ongoing research project at the University of Washington for finding personal Homepages. robot-history: Research project at the University of Washington in 1995/1996 robot-environment: research modified-date: Fri June 28 14:00:00 1996 modified-by: Marc Langheinrich robot-name: Alkaline robot-cover-url: http://www.vestris.com/alkaline robot-details-url: http://www.vestris.com/alkaline robot-owner-name: Daniel Doubrovkine robot-owner-url: http://cuiwww.unige.ch/~doubrov5 robot-owner-email: dblock@vestris.com robot-status: development active robot-purpose: indexing robot-type: standalone robot-platform: unix windows95 windowsNT robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: AlkalineBOT robot-noindex: yes robot-host: * robot-from: no robot-useragent: AlkalineBOT robot-language: c++ robot-description: Unix/NT internet/intranet search engine robot-history: Vestris Inc. search engine designed at the University of Geneva robot-environment: commercial research modified-date: Thu Dec 10 14:01:13 MET 1998 modified-by: Daniel Doubrovkine <dblock@vestris.com> robot-name:Anthill robot-cover-url:http://www.anthill.org/index.html robot-details-url:http://www.anthill.org/index.html robot-owner-name:Torsten Kaubisch robot-owner-url:http://www.anthill.org/index.html robot-owner-email:info@anthill.org robot-status:development robot-purpose:indexing robot-type:standalone robot-platform:independent robot-availability:not yet robot-exclusion:no (soon in V1.2) robot-exclusion-useragent:anthill robot-noindex:no robot-host:anywhere robot-from:no robot-useragent:AnthillV1.1 robot-language:java robot-description:Anthill is used to gather priceinformation automatically from online stores.support for international versions. robot-history:This is a reasearch project at the University of Mannheim in Germany, professorship Prof. Martin Schader, assistant Dr. Stefan Kuhlins robot-environment:research modified-date:Thu, 6 Dec 2001 01:55:00 GMT modified-by:Torsten Kaubisch robot-name: Walhello appie robot-cover-url: www.walhello.com robot-details-url: www.walhello.com/aboutgl.html robot-owner-name: Aimo Pieterse robot-owner-url: www.walhello.com robot-owner-email: aimo@walhello.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: windows98 robot-availability: none robot-exclusion: yes robot-exclusion-useragent: appie robot-noindex: yes robot-host: 213.10.10.116, 213.10.10.117, 213.10.10.118 robot-from: yes robot-useragent: appie/1.1 robot-language: Visual C++ robot-description: The appie-spider is used to collect and index web pages for the Walhello search engine robot-history: The spider was built in march/april 2000 robot-environment: commercial modified-date: Thu, 20 Jul 2000 22:38:00 GMT modified-by: Aimo Pieterse robot-name: Arachnophilia robot-cover-url: robot-details-url: robot-owner-name: Vince Taluskie robot-owner-url: http://www.ph.utexas.edu/people/vince.html robot-owner-email: taluskie@utpapa.ph.utexas.edu robot-status: robot-purpose: robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: halsoft.com robot-from: robot-useragent: Arachnophilia robot-language: robot-description: The purpose (undertaken by HaL Software) of this run was to collect approximately 10k html documents for testing automatic abstract generation robot-history: robot-environment: modified-date: modified-by: robot-name: Arale robot-cover-url: http://web.tiscali.it/_flat robot-details-url: http://web.tiscali.it/_flat robot-owner-name: Flavio Tordini robot-owner-url: http://web.tiscali.it/_flat robot-owner-email: flaviotordini@tiscali.it robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: unix, windows, windows95, windowsNT, os2, mac, linux robot-availability: source, binary robot-exclusion: no robot-exclusion-useragent: arale robot-noindex: no robot-host: * robot-from: no robot-useragent: no robot-language: java robot-description: A java multithreaded web spider. Download entire web sites or specific resources from the web. Render dynamic sites to static pages. robot-history: This is brand new. robot-environment: hobby modified-date: Thu, 09 Jan 2001 17:28:52 GMT modified-by: Flavio Tordini robot-name: Araneo robot-cover-url: http://esperantisto.net robot-details-url: http://esperantisto.net/araneo/ robot-owner-name: Arto Sarle robot-owner-url: http://esperantisto.net robot-owner-email: araneo@esperantisto.net robot-status: development robot-purpose: indexing, statistics robot-type: standalone robot-platform: Linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: araneo robot-noindex: yes robot-nofollow: yes robot-host: *.esperantisto.net robot-from: yes robot-useragent: Araneo/0.7 (araneo@esperantisto.net; http://esperantisto.net) robot-language: Python, Java robot-description: Araneo is a web robot developed for crawling and indexing web pages written in the international language Esperanto. The database will be used to build a web search engine and auxiliary services to be published at esperantisto.net. robot-history: (The name Araneo means "spider" in Esperanto.) robot-environment: hobby, research modified-date: Fri, 16 Nov 2001 08:30:00 GMT modified-by: Arto Sarle robot-name: ArchitextSpider robot-cover-url: http://www.excite.com/ robot-details-url: robot-owner-name: Architext Software robot-owner-url: http://www.atext.com/spider.html robot-owner-email: spider@atext.com robot-status: robot-purpose: indexing, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: *.atext.com robot-from: yes robot-useragent: ArchitextSpider robot-language: perl 5 and c robot-description: Its purpose is to generate a Resource Discovery database, and to generate statistics. The ArchitextSpider collects information for the Excite and WebCrawler search engines. robot-history: robot-environment: modified-date: Tue Oct 3 01:10:26 1995 modified-by: robot-name: Aretha robot-cover-url: robot-details-url: robot-owner-name: Dave Weiner robot-owner-url: http://www.hotwired.com/Staff/userland/ robot-owner-email: davew@well.com robot-status: robot-purpose: robot-type: robot-platform: Macintosh robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: robot-from: robot-useragent: robot-language: robot-description: A crude robot built on top of Netscape and Userland Frontier, a scripting system for Macs robot-history: robot-environment: modified-date: modified-by: robot-name: ARIADNE robot-cover-url: (forthcoming) robot-details-url: (forthcoming) robot-owner-name: Mr. Matthias H. Gross robot-owner-url: http://www.lrz-muenchen.de/~gross/ robot-owner-email: Gross@dbs.informatik.uni-muenchen.de robot-status: development robot-purpose: statistics, development of focused crawling strategies robot-type: standalone robot-platform: java robot-availability: none robot-exclusion: yes robot-exclusion-useragent: ariadne robot-noindex: no robot-host: dbs.informatik.uni-muenchen.de robot-from: no robot-useragent: Due to a deficiency in Java it's not currently possible to set the User-Agent. robot-language: java robot-description: The ARIADNE robot is a prototype of a environment for testing focused crawling strategies. robot-history: This robot is part of a research project at the University of Munich (LMU), started in 2000. robot-environment: research modified-date: Mo, 13 Mar 2000 14:00:00 GMT modified-by: Mr. Matthias H. Gross robot-name:arks robot-cover-url:http://www.dpsindia.com robot-details-url:http://www.dpsindia.com robot-owner-name:Aniruddha Choudhury robot-owner-url: robot-owner-email:aniruddha.c@usa.net robot-status:development robot-purpose:indexing robot-type:standalone robot-platform:PLATFORM INDEPENDENT robot-availability:data robot-exclusion:yes robot-exclusion-useragent:arks robot-noindex:no robot-host:dpsindia.com robot-from:no robot-useragent:arks/1.0 robot-language:Java 1.2 robot-description:The Arks robot is used to build the database for the dpsindia/lawvistas.com search service . The robot runs weekly, and visits sites in a random order robot-history:finds its root from s/w development project for a portal robot-environment:commercial modified-date:6 th November 2000 modified-by:Aniruddha Choudhury robot-name: ASpider (Associative Spider) robot-cover-url: robot-details-url: robot-owner-name: Fred Johansen robot-owner-url: http://www.pvv.ntnu.no/~fredj/ robot-owner-email: fredj@pvv.ntnu.no robot-status: retired robot-purpose: indexing robot-type: robot-platform: unix robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: nova.pvv.unit.no robot-from: yes robot-useragent: ASpider/0.09 robot-language: perl4 robot-description: ASpider is a CGI script that searches the web for keywords given by the user through a form. robot-history: robot-environment: hobby modified-date: modified-by: robot-name: ATN Worldwide robot-details-url: robot-cover-url: robot-owner-name: All That Net robot-owner-url: http://www.allthatnet.com robot-owner-email: info@allthatnet.com robot-status: active robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: ATN_Worldwide robot-noindex: robot-nofollow: robot-host: www.allthatnet.com robot-from: robot-useragent: ATN_Worldwide robot-language: robot-description: The ATN robot is used to build the database for the AllThatNet search service operated by All That Net. The robot runs weekly, and visits sites in a random order. robot-history: robot-environment: modified-date: July 09, 2000 17:43 GMT robot-name: Atomz.com Search Robot robot-cover-url: http://www.atomz.com/help/ robot-details-url: http://www.atomz.com/ robot-owner-name: Mike Thompson robot-owner-url: http://www.atomz.com/ robot-owner-email: mike@atomz.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: service robot-exclusion: yes robot-exclusion-useragent: Atomz robot-noindex: yes robot-host: www.atomz.com robot-from: no robot-useragent: Atomz/1.0 robot-language: c robot-description: Robot used for web site search service. robot-history: Developed for Atomz.com, launched in 1999. robot-environment: service modified-date: Tue Jul 13 03:50:06 GMT 1999 modified-by: Mike Thompson robot-name: AURESYS robot-cover-url: http://crrm.univ-mrs.fr robot-details-url: http://crrm.univ-mrs.fr robot-owner-name: Mannina Bruno robot-owner-url: ftp://crrm.univ-mrs.fr/pub/CVetud/Etudiants/Mannina/CVbruno.htm robot-owner-email: mannina@crrm.univ-mrs.fr robot-status: robot actively in use robot-purpose: indexing,statistics robot-type: Standalone robot-platform: Aix, Unix robot-availability: Protected by Password robot-exclusion: Yes robot-exclusion-useragent: robot-noindex: no robot-host: crrm.univ-mrs.fr, 192.134.99.192 robot-from: Yes robot-useragent: AURESYS/1.0 robot-language: Perl 5.001m robot-description: The AURESYS is used to build a personnal database for somebody who search information. The database is structured to be analysed. AURESYS can found new server by IP incremental. It generate statistics... robot-history: This robot finds its roots in a research project at the University of Marseille in 1995-1996 robot-environment: used for Research modified-date: Mon, 1 Jul 1996 14:30:00 GMT modified-by: Mannina Bruno robot-name: BackRub robot-cover-url: robot-details-url: robot-owner-name: Larry Page robot-owner-url: http://backrub.stanford.edu/ robot-owner-email: page@leland.stanford.edu robot-status: robot-purpose: indexing, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: *.stanford.edu robot-from: yes robot-useragent: BackRub/*.* robot-language: Java. robot-description: robot-history: robot-environment: modified-date: Wed Feb 21 02:57:42 1996. modified-by: robot-cover-url: http://www.baytsp.com/ robot-details-url: http://www.baytsp.com/ robot-owner-name: BayTSP.com,Inc robot-owner-url: robot-owner-email: marki@baytsp.com robot-status: Active robot-purpose: Copyright Infringement Tracking robot-type: Stand Alone robot-platform: NT robot-availability: 24/7 robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: robot-from: robot-useragent: BaySpider robot-language: English robot-description: robot-history: robot-environment: modified-date: 1/15/2001 modified-by: Marki@baytsp.com robot-name: BBot robot-cover-url: http://www.otthon.net/search robot-details-url: http://www.otthon.net/search/bbot robot-owner-name: Istvan Fulop robot-owner-url: http://www.otthon.net robot-owner-email: poluf1 at yahoo dot co dot uk robot-status: development robot-purpose: indexing, maintenance robot-type: standalone robot-platform: windows robot-availability: none robot-exclusion: yes robot-exclusion-useragent: bbot robot-noindex: yes robot-nofollow: yes robot-host: *.netcologne.de robot-from: yes robot-useragent: bbot/0.100 robot-language: perl robot-description: Mainly intended for site level search, sometimes set loose. robot-history: Started project in 11/2000. Called BBot since 24/04/2003. robot-environment: hobby modified-date: Sun, 04 May 2003 10:15:00 GMT modified-by: Istvan Fulop robot-name: Big Brother robot-cover-url: http://pauillac.inria.fr/~fpottier/mac-soft.html.en robot-details-url: robot-owner-name: Francois Pottier robot-owner-url: http://pauillac.inria.fr/~fpottier/ robot-owner-email: Francois.Pottier@inria.fr robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: mac robot-availability: binary robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: not as of 1.0 robot-useragent: Big Brother robot-language: c++ robot-description: Macintosh-hosted link validation tool. robot-history: robot-environment: shareware modified-date: Thu Sep 19 18:01:46 MET DST 1996 modified-by: Francois Pottier robot-name: Bjaaland robot-cover-url: http://www.textuality.com robot-details-url: http://www.textuality.com robot-owner-name: Tim Bray robot-owner-url: http://www.textuality.com robot-owner-email: tbray@textuality.com robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Bjaaland robot-noindex: no robot-host: barry.bitmovers.net robot-from: no robot-useragent: Bjaaland/0.5 robot-language: perl5 robot-description: Crawls sites listed in the ODP (see http://dmoz.org) robot-history: None, yet robot-environment: service modified-date: Monday, 19 July 1999, 13:46:00 PDT modified-by: tbray@textuality.com robot-name: BlackWidow robot-cover-url: http://140.190.65.12/~khooghee/index.html robot-details-url: robot-owner-name: Kevin Hoogheem robot-owner-url: robot-owner-email: khooghee@marys.smumn.edu robot-status: robot-purpose: indexing, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: 140.190.65.* robot-from: yes robot-useragent: BlackWidow robot-language: C, C++. robot-description: Started as a research project and now is used to find links for a random link generator. Also is used to research the growth of specific sites. robot-history: robot-environment: modified-date: Fri Feb 9 00:11:22 1996. modified-by: robot-name: Die Blinde Kuh robot-cover-url: http://www.blinde-kuh.de/ robot-details-url: http://www.blinde-kuh.de/robot.html (german language) robot-owner-name: Stefan R. Mueller robot-owner-url: http://www.rrz.uni-hamburg.de/philsem/stefan_mueller/ robot-owner-email:maschinist@blinde-kuh.de robot-status: development robot-purpose: indexing robot-type: browser robot-platform: unix robot-availability: none robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: minerva.sozialwiss.uni-hamburg.de robot-from: yes robot-useragent: Die Blinde Kuh robot-language: perl5 robot-description: The robot is use for indixing and proofing the registered urls in the german language search-engine for kids. Its a none-comercial one-woman-project of Birgit Bachmann living in Hamburg, Germany. robot-history: The robot was developed by Stefan R. Mueller to help by the manual proof of registered Links. robot-environment: hobby modified-date: Mon Jul 22 1998 modified-by: Stefan R. Mueller robot-name:Bloodhound robot-cover-url:http://web.ukonline.co.uk/genius/bloodhound.htm robot-details-url:http://web.ukonline.co.uk/genius/bloodhound.htm robot-owner-name:Dean Smart robot-owner-url:http://web.ukonline.co.uk/genius/bloodhound.htm robot-owner-email:genius@ukonline.co.uk robot-status:active robot-purpose:Web Site Download robot-type:standalone robot-platform:Windows95, WindowsNT, Windows98, Windows2000 robot-availability:Executible robot-exclusion:No robot-exclusion-useragent:Ukonline robot-noindex:No robot-host:* robot-from:No robot-useragent:None robot-language:Perl5 robot-description:Bloodhound will download an whole web site depending on the number of links to follow specified by the user. robot-history:First version was released on the 1 july 2000 robot-environment:Commercial modified-date:1 july 2000 modified-by:Dean Smart robot-name: Borg-Bot robot-cover-url: robot-details-url: http://www.skunkfarm.com/borgbot.htm robot-owner-name: James Bragg robot-owner-url: http://www.skunkfarm.com robot-owner-email: botdev@skunkfarm.com robot-status: development robot-purpose: indexing statistics robot-type: standalone robot-platform: Linux Windows2000 robot-availability: none robot-exclusion: yes robot-exclusion-useragent: borg-bot/0.9 robot-noindex: yes robot-host: 24.11.13.173 robot-from: yes robot-useragent: borg-bot/0.9 robot-language: python robot-description: Developmental crawler to feed a search engine robot-history: robot-environment: research service modified-date: Sat, 20 Oct 2001 04:00:00 GMT modified-by: Sat, 20 Oct 2001 04:00:00 GMT robot-name: bright.net caching robot robot-cover-url: robot-details-url: robot-owner-name: robot-owner-url: robot-owner-email: robot-status: active robot-purpose: caching robot-type: robot-platform: robot-availability: none robot-exclusion: no robot-noindex: robot-host: 209.143.1.46 robot-from: no robot-useragent: Mozilla/3.01 (compatible;) robot-language: robot-description: robot-history: robot-environment: modified-date: Fri Nov 13 14:08:01 EST 1998 modified-by: brian d foy <comdog@computerdog.com> robot-name: BSpider robot-cover-url: not yet robot-details-url: not yet robot-owner-name: Yo Okumura robot-owner-url: not yet robot-owner-email: okumura@rsl.crl.fujixerox.co.jp robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: bspider robot-noindex: yes robot-host: 210.159.73.34, 210.159.73.35 robot-from: yes robot-useragent: BSpider/1.0 libwww-perl/0.40 robot-language: perl robot-description: BSpider is crawling inside of Japanese domain for indexing. robot-history: Starts Apr 1997 in a research project at Fuji Xerox Corp. Research Lab. robot-environment: research modified-date: Mon, 21 Apr 1997 18:00:00 JST modified-by: Yo Okumura robot-name: CACTVS Chemistry Spider robot-cover-url: http://schiele.organik.uni-erlangen.de/cactvs/spider.html robot-details-url: robot-owner-name: W. D. Ihlenfeldt robot-owner-url: http://schiele.organik.uni-erlangen.de/cactvs/ robot-owner-email: wdi@eros.ccc.uni-erlangen.de robot-status: robot-purpose: indexing. robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: utamaro.organik.uni-erlangen.de robot-from: no robot-useragent: CACTVS Chemistry Spider robot-language: TCL, C robot-description: Locates chemical structures in Chemical MIME formats on WWW and FTP servers and downloads them into database searchable with structure queries (substructure, fullstructure, formula, properties etc.) robot-history: robot-environment: modified-date: Sat Mar 30 00:55:40 1996. modified-by: robot-name: Calif robot-details-url: http://www.tnps.dp.ua/calif/details.html robot-cover-url: http://www.tnps.dp.ua/calif/ robot-owner-name: Alexander Kosarev robot-owner-url: http://www.tnps.dp.ua/~dark/ robot-owner-email: kosarev@tnps.net robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: calif robot-noindex: yes robot-host: cobra.tnps.dp.ua robot-from: yes robot-useragent: Calif/0.6 (kosarev@tnps.net; http://www.tnps.dp.ua) robot-language: c++ robot-description: Used to build searchable index robot-history: In development stage robot-environment: research modified-date: Sun, 6 Jun 1999 13:25:33 GMT robot-name: Cassandra robot-cover-url: http://post.mipt.rssi.ru/~billy/search/ robot-details-url: http://post.mipt.rssi.ru/~billy/search/ robot-owner-name: Mr. Oleg Bilibin robot-owner-url: http://post.mipt.rssi.ru/~billy/ robot-owner-email: billy168@aha.ru robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: crossplatform robot-availability: none robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: www.aha.ru robot-from: no robot-useragent: robot-language: java robot-description: Cassandra search robot is used to create and maintain indexed database for widespread Information Retrieval System robot-history: Master of Science degree project at Moscow Institute of Physics and Technology robot-environment: research modified-date: Wed, 3 Jun 1998 12:00:00 GMT robot-name: Digimarc Marcspider/CGI robot-cover-url: http://www.digimarc.com/prod_fam.html robot-details-url: http://www.digimarc.com/prod_fam.html robot-owner-name: Digimarc Corporation robot-owner-url: http://www.digimarc.com robot-owner-email: wmreader@digimarc.com robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: 206.102.3.* robot-from: robot-useragent: Digimarc CGIReader/1.0 robot-language: c++ robot-description: Similar to Digimarc Marcspider, Marcspider/CGI examines image files for watermarks but more focused on CGI Urls. In order to not waste internet bandwidth with yet another crawler, we have contracted with one of the major crawlers/seach engines to provide us with a list of specific CGI URLs of interest to us. If an URL is to a page of interest (via CGI), then we access the page to get the image URLs from it, but we do not crawl to any other pages. robot-history: First operation in December 1997 robot-environment: service modified-date: Fri, 5 Dec 1997 12:00:00 GMT modified-by: Dan Ramos robot-name: Checkbot robot-cover-url: http://www.xs4all.nl/~graaff/checkbot/ robot-details-url: robot-owner-name: Hans de Graaff robot-owner-url: http://www.xs4all.nl/~graaff/checkbot/ robot-owner-email: graaff@xs4all.nl robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: unix,WindowsNT robot-availability: source robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: no robot-useragent: Checkbot/x.xx LWP/5.x robot-language: perl 5 robot-description: Checkbot checks links in a given set of pages on one or more servers. It reports links which returned an error code robot-history: robot-environment: hobby modified-date: Tue Jun 25 07:44:00 1996 modified-by: Hans de Graaff robot-name: ChristCrawler.com robot-cover-url: http://www.christcrawler.com/search.cfm robot-details-url: http://www.christcrawler.com/index.cfm robot-owner-name: Jeremy DeYoung robot-owner-url: http://www.christcentral.com/aboutus/index.cfm robot-owner-email: jeremy.deyoung@christcentral.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Windows NT 4.0 SP5 robot-availability: none robot-exclusion: yes robot-exclusion-useragent: christcrawler robot-noindex: yes robot-host: 64.51.218.*, 64.51.219.*, 12.107.236.*, 12.107.237.* robot-from: yes robot-useragent: Mozilla/4.0 (compatible; ChristCrawler.com, ChristCrawler@ChristCENTRAL.com) robot-language: Cold Fusion 4.5 robot-description: A Christian internet spider that searches web sites to find Christian Related material robot-history: Developed because of the growing need for a more God influence on the Internet. robot-environment: service modified-date: Fri, 27 Jun 2001 00:53:12 CST modified-by: Jeremy DeYoung robot-name: churl robot-cover-url: http://www-personal.engin.umich.edu/~yunke/scripts/churl/ robot-details-url: robot-owner-name: Justin Yunke robot-owner-url: http://www-personal.engin.umich.edu/~yunke/ robot-owner-email: yunke@umich.edu robot-status: robot-purpose: maintenance robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: robot-useragent: robot-language: robot-description: A URL checking robot, which stays within one step of the local server robot-history: robot-environment: modified-date: modified-by: robot-name: cIeNcIaFiCcIoN.nEt robot-cover-url: http://www.cienciaficcion.net/ robot-details-url: http://www.cienciaficcion.net/ robot-owner-name: David Fernández robot-owner-url: http://www.cyberdark.net/ robot-owner-email: root@cyberdark.net robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: linux robot-availability: none robot-exclusion: no robot-exclusion-useragent: robot-noindex: yes robot-host: epervier.cqhost.net robot-from: no robot-useragent: cIeNcIaFiCcIoN.nEt Spider (http://www.cienciaficcion.net) robot-language: php,perl robot-description: Robot encargado de la indexación de las páginas para www.cienciaficcion.net robot-history: Alcorkón (Madrid) - Europa 2000/2001 robot-environment: hobby modified-date: Sat, 18 Aug 2001 00:38:52 GMT modified-by: David Fernández robot-name: CMC/0.01 robot-details-url: http://www2.next.ne.jp/cgi-bin/music/help.cgi?phase=robot robot-cover-url: http://www2.next.ne.jp/music/ robot-owner-name: Shinobu Kubota. robot-owner-url: http://www2.next.ne.jp/cgi-bin/music/help.cgi?phase=profile robot-owner-email: shinobu@po.next.ne.jp robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: CMC/0.01 robot-noindex: no robot-host: haruna.next.ne.jp, 203.183.218.4 robot-from: yes robot-useragent: CMC/0.01 robot-language: perl5 robot-description: This CMC/0.01 robot collects the information of the page that was registered to the music specialty searching service. robot-history: This CMC/0.01 robot was made for the computer music center on November 4, 1997. robot-environment: hobby modified-date: Sat, 23 May 1998 17:22:00 GMT robot-name:Collective robot-cover-url:http://web.ukonline.co.uk/genius/collective.htm robot-details-url:http://web.ukonline.co.uk/genius/collective.htm robot-owner-name:Dean Smart robot-owner-url:http://web.ukonline.co.uk/genius/collective.htm robot-owner-email:genius@ukonline.co.uk robot-status:development robot-purpose:Collective is a highly configurable program designed to interrogate online search engines and online databases, it will ignore web pages that lie about there content, and dead url's, it can be super strict, it searches each web page it finds for your search terms to ensure those terms are present, any positive urls are added to a html file for your to view at any time even before the program has finished. Collective can wonder the web for days if required. robot-type:standalone robot-platform:Windows95, WindowsNT, Windows98, Windows2000 robot-availability:Executible robot-exclusion:No robot-exclusion-useragent: robot-noindex:No robot-host:* robot-from:No robot-useragent:LWP robot-language:Perl5 (With Visual Basic front-end) robot-description:Collective is the most cleverest Internet search engine, With all found url?s guaranteed to have your search terms. robot-history:Develpment started on August, 03, 2000 robot-environment:Commercial modified-date:August, 03, 2000 modified-by:Dean Smart robot-name: Combine System robot-cover-url: http://www.ub2.lu.se/~tsao/combine.ps robot-details-url: http://www.ub2.lu.se/~tsao/combine.ps robot-owner-name: Yong Cao robot-owner-url: http://www.ub2.lu.se/ robot-owner-email: tsao@munin.ub2.lu.se robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: combine robot-noindex: no robot-host: *.ub2.lu.se robot-from: yes robot-useragent: combine/0.0 robot-language: c, perl5 robot-description: An open, distributed, and efficient harvester. robot-history: A complete re-design of the NWI robot (w3index) for DESIRE project. robot-environment: research modified-date: Tue, 04 Mar 1997 16:11:40 GMT modified-by: Yong Cao robot-name: Conceptbot robot-cover-url: http://www.aptltd.com/~sifry/conceptbot/tech.html robot-details-url: http://www.aptltd.com/~sifry/conceptbot robot-owner-name: David L. Sifry robot-owner-url: http://www.aptltd.com/~sifry robot-owner-email: david@sifry.com robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: yes robot-exclusion-useragent: conceptbot robot-noindex: yes robot-host: router.sifry.com robot-from: yes robot-useragent: conceptbot/0.3 robot-language: perl5 robot-description:The Conceptbot spider is used to research concept-based search indexing techniques. It uses a breadth first seach to spread out the number of hits on a single site over time. The spider runs at irregular intervals and is still under construction. robot-history: This spider began as a research project at Sifry Consulting in April 1996. robot-environment: research modified-date: Mon, 9 Sep 1996 15:31:07 GMT modified-by: David L. Sifry <david@sifry.com> robot-name: CoolBot robot-cover-url: www.suchmaschine21.de robot-details-url: www.suchmaschine21.de robot-owner-name: Stefan Fischerlaender robot-owner-url: www.suchmaschine21.de robot-owner-email: info@suchmaschine21.de robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: CoolBot robot-noindex: yes robot-host: www.suchmaschine21.de robot-from: no robot-useragent: CoolBot robot-language: perl5 robot-description: The CoolBot robot is used to build and maintain the directory of the german search engine Suchmaschine21. robot-history: none so far robot-environment: service modified-date: Wed, 21 Jan 2001 12:16:00 GMT modified-by: Stefan Fischerlaender robot-name: Web Core / Roots robot-cover-url: http://www.di.uminho.pt/wc robot-details-url: robot-owner-name: Jorge Portugal Andrade robot-owner-url: http://www.di.uminho.pt/~cbm robot-owner-email: wc@di.uminho.pt robot-status: robot-purpose: indexing, maintenance robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: shiva.di.uminho.pt, from www.di.uminho.pt robot-from: no robot-useragent: root/0.1 robot-language: perl robot-description: Parallel robot developed in Minho Univeristy in Portugal to catalog relations among URLs and to support a special navigation aid. robot-history: First versions since October 1995. robot-environment: modified-date: Wed Jan 10 23:19:08 1996. modified-by: robot-name: XYLEME Robot robot-cover-url: http://xyleme.com/ robot-details-url: robot-owner-name: Mihai Preda robot-owner-url: http://www.mihaipreda.com/ robot-owner-email: preda@xyleme.com robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: yes robot-exclusion-useragent: cosmos robot-noindex: no robot-nofollow: no robot-host: robot-from: yes robot-useragent: cosmos/0.3 robot-language: c++ robot-description: index XML, follow HTML robot-history: robot-environment: service modified-date: Fri, 24 Nov 2000 00:00:00 GMT modified-by: Mihai Preda robot-name: Internet Cruiser Robot robot-cover-url: http://www.krstarica.com/ robot-details-url: http://www.krstarica.com/eng/url/ robot-owner-name: Internet Cruiser robot-owner-url: http://www.krstarica.com/ robot-owner-email: robot@krstarica.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Internet Cruiser Robot robot-noindex: yes robot-host: *.krstarica.com robot-from: no robot-useragent: Internet Cruiser Robot/2.1 robot-language: c++ robot-description: Internet Cruiser Robot is Internet Cruiser's prime index agent. robot-history: robot-environment: service modified-date: Fri, 17 Jan 2001 12:00:00 GMT modified-by: tech@krstarica.com robot-name: Cusco robot-cover-url: http://www.cusco.pt/ robot-details-url: http://www.cusco.pt/ robot-owner-name: Filipe Costa Clerigo robot-owner-url: http://www.viatecla.pt/ robot-owner-email: clerigo@viatecla.pt robot-status: active robot-purpose: indexing robot-type: standlone robot-platform: any robot-availability: none robot-exclusion: yes robot-exclusion-useragent: cusco robot-noindex: yes robot-host: *.cusco.pt, *.viatecla.pt robot-from: yes robot-useragent: Cusco/3.2 robot-language: Java robot-description: The Cusco robot is part of the CUCE indexing sistem. It gathers information from several sources: HTTP, Databases or filesystem. At this moment, it's universe is the .pt domain and the information it gathers is available at the Portuguese search engine Cusco http://www.cusco.pt/. robot-history: The Cusco search engine started in the company ViaTecla as a project to demonstrate our development capabilities and to fill the need of a portuguese-specific search engine. Now, we are developping new functionalities that cannot be found in any other on-line search engines. robot-environment:service, research modified-date: Mon, 21 Jun 1999 14:00:00 GMT modified-by: Filipe Costa Clerigo Web hosting with free domain name
|
| |||||||||||||||||||||||||||||