Web hosting with free domain names in europeCpanel X unlimited pop3 accounts with linux web servers
dedicated server uptimeGreek Lang | gr domain name registration 
SERVICES
Web Hosting
Cpanel Free Scripts
Dedicated Servers
Servers Stock
Network
Web Design
Domain Parking
Domain Registration
Data Center Tour
FREE WEB TOOLS
Flash Toolbar Generators
Graphic Toolbar Generators
DHTML / CSS Menu Generators
Java Script Menu Generators
MAILING LIST
Sign up to our mailing list
E-mail:
I want to:
SSL
Google Ads

Submission Robots Index

  • robot-id: netcarta
    robot-name: NetCarta WebMap Engine
    robot-cover-url: http://www.netcarta.com/
    robot-details-url:
    robot-owner-name: NetCarta WebMap Engine
    robot-owner-url: http://www.netcarta.com/
    robot-owner-email: info@netcarta.com
    robot-status:
    robot-purpose: indexing, maintenance, mirroring, statistics
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex:
    robot-host:
    robot-from: yes
    robot-useragent: NetCarta CyberPilot Pro
    robot-language: C++.
    robot-description: The NetCarta WebMap Engine is a general purpose, commercial
    spider. Packaged with a full GUI in the CyberPilo Pro
    product, it acts as a personal spider to work with a browser
    to facilitiate context-based navigation. The WebMapper
    product uses the robot to manage a site (site copy, site
    diff, and extensive link management facilities). All
    versions can create publishable NetCarta WebMaps, which
    capture the crawled information. If the robot sees a
    published map, it will return the published map rather than
    continuing its crawl. Since this is a personal spider, it
    will be launched from multiple domains. This robot tends to
    focus on a particular site. No instance of the robot should
    have more than one outstanding request out to any given site
    at a time. The User-agent field contains a coded ID
    identifying the instance of the spider; specific users can
    be blocked via robots.txt using this ID.
    robot-history:
    robot-environment:
    modified-date: Sun Feb 18 02:02:49 1996.
    modified-by:

  • robot-id: netmechanic
    robot-name: NetMechanic
    robot-cover-url: http://www.netmechanic.com
    robot-details-url: http://www.netmechanic.com/faq.html
    robot-owner-name: Tom Dahm
    robot-owner-url: http://iquest.com/~tdahm
    robot-owner-email: tdahm@iquest.com
    robot-status: development
    robot-purpose: Link and HTML validation
    robot-type: standalone with web gateway
    robot-platform: UNIX
    robot-availability: via web page
    robot-exclusion: Yes
    robot-exclusion-useragent: WebMechanic
    robot-noindex: no
    robot-host: 206.26.168.18
    robot-from: no
    robot-useragent: NetMechanic
    robot-language: C
    robot-description: NetMechanic is a link validation and
    HTML validation robot run using a web page interface.
    robot-history:
    robot-environment:
    modified-date: Sat, 17 Aug 1996 12:00:00 GMT
    modified-by:

  • robot-id: netscoop
    robot-name: NetScoop
    robot-cover-url: http://www-a2k.is.tokushima-u.ac.jp/search/index.html
    robot-owner-name: Kenji Kita
    robot-owner-url: http://www-a2k.is.tokushima-u.ac.jp/member/kita/index.html
    robot-owner-email: kita@is.tokushima-u.ac.jp
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: UNIX
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: NetScoop
    robot-host: alpha.is.tokushima-u.ac.jp, beta.is.tokushima-u.ac.jp
    robot-useragent: NetScoop/1.0 libwww/5.0a
    robot-language: C
    robot-description: The NetScoop robot is used to build the database
    for the NetScoop search engine.
    robot-history: The robot has been used in the research project
    at the Faculty of Engineering, Tokushima University, Japan.,
    since Dec. 1996.
    robot-environment: research
    modified-date: Fri, 10 Jan 1997.
    modified-by: Kenji Kita

  • robot-id: newscan-online
    robot-name: newscan-online
    robot-cover-url: http://www.newscan-online.de/
    robot-details-url: http://www.newscan-online.de/info.html
    robot-owner-name: Axel Mueller
    robot-owner-url:
    robot-owner-email: mueller@newscan-online.de
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: Linux
    robot-availability: binary
    robot-exclusion: yes
    robot-exclusion-useragent: newscan-online
    robot-noindex: no
    robot-host: *newscan-online.de
    robot-from: yes
    robot-useragent: newscan-online/1.1
    robot-language: perl
    robot-description: The newscan-online robot is used to build a database for
    the newscan-online news search service operated by smart information
    services. The robot runs daily and visits predefined sites in a random order.
    robot-history: This robot finds its roots in a prereleased software for
    news filtering for Lotus Notes in 1995.
    robot-environment: service
    modified-date: Fri, 9 Apr 1999 11:45:00 GMT
    modified-by: Axel Mueller

  • robot-id: nhse
    robot-name: NHSE Web Forager
    robot-cover-url: http://nhse.mcs.anl.gov/
    robot-details-url:
    robot-owner-name: Robert Olson
    robot-owner-url: http://www.mcs.anl.gov/people/olson/
    robot-owner-email: olson@mcs.anl.gov
    robot-status:
    robot-purpose: indexing
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host: *.mcs.anl.gov
    robot-from: yes
    robot-useragent: NHSEWalker/3.0
    robot-language: perl 5
    robot-description: to generate a Resource Discovery database
    robot-history:
    robot-environment:
    modified-date: Fri May 5 15:47:55 1995
    modified-by:

  • robot-id: nomad
    robot-name: Nomad
    robot-cover-url: http://www.cs.colostate.edu/~sonnen/projects/nomad.html
    robot-details-url:
    robot-owner-name: Richard Sonnen
    robot-owner-url: http://www.cs.colostate.edu/~sonnen/
    robot-owner-email: sonnen@cs.colostat.edu
    robot-status:
    robot-purpose: indexing
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: no
    robot-exclusion-useragent:
    robot-noindex:
    robot-host: *.cs.colostate.edu
    robot-from: no
    robot-useragent: Nomad-V2.x
    robot-language: Perl 4
    robot-description:
    robot-history: Developed in 1995 at Colorado State University.
    robot-environment:
    modified-date: Sat Jan 27 21:02:20 1996.
    modified-by:

  • robot-id: northstar
    robot-name: The NorthStar Robot
    robot-cover-url: http://comics.scs.unr.edu:7000/top.html
    robot-details-url:
    robot-owner-name: Fred Barrie
    robot-owner-url:
    robot-owner-email: barrie@unr.edu
    robot-status:
    robot-purpose: indexing
    robot-type:
    robot-platform:
    robot-availability:
    robot-exclusion:
    robot-exclusion-useragent:
    robot-noindex:
    robot-host: frognot.utdallas.edu, utdallas.edu, cnidir.org
    robot-from: yes
    robot-useragent: NorthStar
    robot-language:
    robot-description: Recent runs (26 April 94) will concentrate on textual
    analysis of the Web versus GopherSpace (from the Veronica
    data) as well as indexing.
    robot-history:
    robot-environment:
    modified-date:
    modified-by:

  • robot-id: occam
    robot-name: Occam
    robot-cover-url: http://www.cs.washington.edu/research/projects/ai/www/occam/
    robot-details-url:
    robot-owner-name: Marc Friedman
    robot-owner-url: http://www.cs.washington.edu/homes/friedman/
    robot-owner-email: friedman@cs.washington.edu
    robot-status: development
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: Occam
    robot-noindex: no
    robot-host: gentian.cs.washington.edu, sekiu.cs.washington.edu, saxifrage.cs.washington.edu
    robot-from: yes
    robot-useragent: Occam/1.0
    robot-language: CommonLisp, perl4
    robot-description: The robot takes high-level queries, breaks them down into
    multiple web requests, and answers them by combining disparate
    data gathered in one minute from numerous web sites, or from
    the robots cache. Currently the only user is me.
    robot-history: The robot is a descendant of Rodney,
    an earlier project at the University of Washington.
    robot-environment: research
    modified-date: Thu, 21 Nov 1996 20:30 GMT
    modified-by: friedman@cs.washington.edu (Marc Friedman)

  • robot-id: octopus
    robot-name: HKU WWW Octopus
    robot-cover-url: http://phoenix.cs.hku.hk:1234/~jax/w3rui.shtml
    robot-details-url:
    robot-owner-name: Law Kwok Tung , Lee Tak Yeung , Lo Chun Wing
    robot-owner-url: http://phoenix.cs.hku.hk:1234/~jax
    robot-owner-email: jax@cs.hku.hk
    robot-status:
    robot-purpose: indexing
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: no.
    robot-exclusion-useragent:
    robot-noindex:
    robot-host: phoenix.cs.hku.hk
    robot-from: yes
    robot-useragent: HKU WWW Robot,
    robot-language: Perl 5, C, Java.
    robot-description: HKU Octopus is an ongoing project for resource discovery in
    the Hong Kong and China WWW domain . It is a research
    project conducted by three undergraduate at the University
    of Hong Kong
    robot-history:
    robot-environment:
    modified-date: Thu Mar 7 14:21:55 1996.
    modified-by:

  • robot-id: openfind
    robot-name: Openfind data gatherer
    robot-cover-url: http://www.openfind.com.tw/
    robot-details-url: http://www.openfind.com.tw/robot.html
    robot-owner-name:
    robot-owner-url:
    robot-owner-email: robot-response@openfind.com.tw
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex:
    robot-host: 66.7.131.132
    robot-from:
    robot-useragent: Openfind data gatherer, Openbot/3.0+(robot-response@openfind.com.tw;+http://www.openfind.com.tw/robot.html)
    robot-language:
    robot-description:
    robot-history:
    robot-environment:
    modified-date: Thu, 26 Apr 2001 02:55:21 GMT
    modified-by: stanislav shalunov <shalunov@internet2.edu>

  • robot-id: orb_search
    robot-name: Orb Search
    robot-cover-url: http://orbsearch.home.ml.org
    robot-details-url: http://orbsearch.home.ml.org
    robot-owner-name: Matt Weber
    robot-owner-url: http://www.weberworld.com
    robot-owner-email: webernet@geocities.com
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix
    robot-availability: data
    robot-exclusion: yes
    robot-exclusion-useragent: Orbsearch/1.0
    robot-noindex: yes
    robot-host: cow.dyn.ml.org, *.dyn.ml.org
    robot-from: yes
    robot-useragent: Orbsearch/1.0
    robot-language: Perl5
    robot-description: Orbsearch builds the database for Orb Search Engine.
    It runs when requested.
    robot-history: This robot was started as a hobby.
    robot-environment: hobby
    modified-date: Sun, 31 Aug 1997 02:28:52 GMT
    modified-by: Matt Weber

  • robot-id: packrat
    robot-name: Pack Rat
    robot-cover-url: http://web.cps.msu.edu/~dexterte/isl/packrat.html
    robot-details-url:
    robot-owner-name: Terry Dexter
    robot-owner-url: http://web.cps.msu.edu/~dexterte
    robot-owner-email: dexterte@cps.msu.edu
    robot-status: development
    robot-purpose: both maintenance and mirroring
    robot-type: standalone
    robot-platform: unix
    robot-availability: at the moment, none...source when developed.
    robot-exclusion: yes
    robot-exclusion-useragent: packrat or *
    robot-noindex: no, not yet
    robot-host: cps.msu.edu
    robot-from:
    robot-useragent: PackRat/1.0
    robot-language: perl with libwww-5.0
    robot-description: Used for local maintenance and for gathering
    web pages so
    that local statisistical info can be used in artificial intelligence programs.
    Funded by NEMOnline.
    robot-history: In the making...
    robot-environment: research
    modified-date: Tue, 20 Aug 1996 15:45:11
    modified-by: Terry Dexter

  • robot-id:pageboy
    robot-name:PageBoy
    robot-cover-url:http://www.webdocs.org/
    robot-details-url:http://www.webdocs.org/
    robot-owner-name:Chihiro Kuroda
    robot-owner-url:http://www.webdocs.org/
    robot-owner-email:pageboy@webdocs.org
    robot-status:development
    robot-purpose:indexing
    robot-type:standalone
    robot-platform:unix
    robot-availability:none
    robot-exclusion:yes
    robot-exclusion-useragent:pageboy
    robot-noindex:yes
    robot-nofollow:yes
    robot-host:*.webdocs.org
    robot-from:yes
    robot-useragent:PageBoy/1.0
    robot-language:c
    robot-description:The robot visits at regular intervals.
    robot-history:none
    robot-environment:service
    modified-date:Fri, 21 Oct 1999 17:28:52 GMT
    modified-by:webdocs

  • robot-id: parasite
    robot-name: ParaSite
    robot-cover-url: http://www.ianett.com/parasite/
    robot-details-url: http://www.ianett.com/parasite/
    robot-owner-name: iaNett.com
    robot-owner-url: http://www.ianett.com/
    robot-owner-email: parasite@ianett.com
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: windowsNT
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: ParaSite
    robot-noindex: yes
    robot-nofollow: yes
    robot-host: *.ianett.com
    robot-from: yes
    robot-useragent: ParaSite/0.21 (http://www.ianett.com/parasite/)
    robot-language: c++
    robot-description: Builds index for ianett.com search database. Runs
    continiously.
    robot-history: Second generation of ianett.com spidering technology,
    originally called Sven.
    robot-environment: service
    modified-date: July 28, 2000
    modified-by: Marty Anstey

  • robot-id: patric
    robot-name: Patric
    robot-cover-url: http://www.nwnet.net/technical/ITR/index.html
    robot-details-url: http://www.nwnet.net/technical/ITR/index.html
    robot-owner-name: toney@nwnet.net
    robot-owner-url: http://www.nwnet.net/company/staff/toney
    robot-owner-email: webmaster@nwnet.net
    robot-status: development
    robot-purpose: statistics
    robot-type: standalone
    robot-platform: unix
    robot-availability: data
    robot-exclusion: yes
    robot-exclusion-useragent: patric
    robot-noindex: yes
    robot-host: *.nwnet.net
    robot-from: no
    robot-useragent: Patric/0.01a
    robot-language: perl
    robot-description: (contained at http://www.nwnet.net/technical/ITR/index.html )
    robot-history: (contained at http://www.nwnet.net/technical/ITR/index.html )
    robot-environment: service
    modified-date: Thurs, 15 Aug 1996
    modified-by: toney@nwnet.net

  • robot-id: pegasus
    robot-name: pegasus
    robot-cover-url: http://opensource.or.id/projects.html
    robot-details-url: http://pegasus.opensource.or.id
    robot-owner-name: A.Y.Kiky Shannon
    robot-owner-url: http://go.to/ayks
    robot-owner-email: shannon@opensource.or.id
    robot-status: inactive - open source
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix
    robot-availability: source, binary
    robot-exclusion: yes
    robot-exclusion-useragent: pegasus
    robot-noindex: yes
    robot-host: *
    robot-from: yes
    robot-useragent: web robot PEGASUS
    robot-language: perl5
    robot-description: pegasus gathers information from HTML pages (7 important
    tags). The indexing process can be started based on starting URL(s) or a range
    of IP address.
    robot-history: This robot was created as an implementation of a final project on
    Informatics Engineering Department, Institute of Technology Bandung, Indonesia.
    robot-environment: research
    modified-date: Fri, 20 Oct 2000 14:58:40 GMT
    modified-by: A.Y.Kiky Shannon

  • robot-id: perignator
    robot-name: The Peregrinator
    robot-cover-url: http://www.maths.usyd.edu.au:8000/jimr/pe/Peregrinator.html
    robot-details-url:
    robot-owner-name: Jim Richardson
    robot-owner-url: http://www.maths.usyd.edu.au:8000/jimr.html
    robot-owner-email: jimr@maths.su.oz.au
    robot-status:
    robot-purpose:
    robot-type:
    robot-platform:
    robot-availability:
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host:
    robot-from: yes
    robot-useragent: Peregrinator-Mathematics/0.7
    robot-language: perl 4
    robot-description: This robot is being used to generate an index of documents
    on Web sites connected with mathematics and statistics. It
    ignores off-site links, so does not stray from a list of
    servers specified initially.
    robot-history: commenced operation in August 1994
    robot-environment:
    modified-date:
    modified-by:

  • robot-id: perlcrawler
    robot-name: PerlCrawler 1.0
    robot-cover-url: http://perlsearch.hypermart.net/
    robot-details-url: http://www.xav.com/scripts/xavatoria/index.html
    robot-owner-name: Matt McKenzie
    robot-owner-url: http://perlsearch.hypermart.net/
    robot-owner-email: webmaster@perlsearch.hypermart.net
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix
    robot-availability: source
    robot-exclusion: yes
    robot-exclusion-useragent: perlcrawler
    robot-noindex: yes
    robot-host: server5.hypermart.net
    robot-from: yes
    robot-useragent: PerlCrawler/1.0 Xavatoria/2.0
    robot-language: perl5
    robot-description: The PerlCrawler robot is designed to index and build
    a database of pages relating to the Perl programming language.
    robot-history: Originated in modified form on 25 June 1998
    robot-environment: hobby
    modified-date: Fri, 18 Dec 1998 23:37:40 GMT
    modified-by: Matt McKenzie

  • robot-id: phantom
    robot-name: Phantom
    robot-cover-url: http://www.maxum.com/phantom/
    robot-details-url:
    robot-owner-name: Larry Burke
    robot-owner-url: http://www.aktiv.com/
    robot-owner-email: lburke@aktiv.com
    robot-status:
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: Macintosh
    robot-availability:
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex:
    robot-host:
    robot-from: yes
    robot-useragent: Duppies
    robot-language:
    robot-description: Designed to allow webmasters to provide a searchable index
    of their own site as well as to other sites, perhaps with
    similar content.
    robot-history:
    robot-environment:
    modified-date: Fri Jan 19 05:08:15 1996.
    modified-by:

  • robot-id: phpdig
    robot-name: PhpDig
    robot-cover-url: http://phpdig.toiletoine.net/
    robot-details-url: http://phpdig.toiletoine.net/
    robot-owner-name: Antoine Bajolet
    robot-owner-url: http://phpdig.toiletoine.net/
    robot-owner-email: phpdig@toiletoine.net
    robot-status: *
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: all supported by Apache/php/mysql
    robot-availability: source
    robot-exclusion: yes
    robot-exclusion-useragent: phpdig
    robot-noindex: yes
    robot-host: yes
    robot-from: no
    robot-useragent: phpdig/x.x.x
    robot-language: php 4.x
    robot-description: Small robot and search engine written in php.
    robot-history: writen first 2001-03-30
    robot-environment: hobby
    modified-date: Sun, 21 Nov 2001 20:01:19 GMT
    modified-by: Antoine Bajolet

  • robot-id: piltdownman
    robot-name: PiltdownMan
    robot-cover-url: http://profitnet.bizland.com/
    robot-details-url: http://profitnet.bizland.com/piltdownman.html
    robot-owner-name: Daniel VilÓ
    robot-owner-url: http://profitnet.bizland.com/aboutus.html
    robot-owner-email: profitnet@myezmail.com
    robot-status: active
    robot-purpose: statistics
    robot-type: standalone
    robot-platform: windows95, windows98, windowsNT
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: piltdownman
    robot-noindex: no
    robot-nofollow: no
    robot-host: 62.36.128.*, 194.133.59.*, 212.106.215.*
    robot-from: no
    robot-useragent: PiltdownMan/1.0 profitnet@myezmail.com
    robot-language: c++
    robot-description: The PiltdownMan robot is used to get a
    list of links from the search engines
    in our database. These links are
    followed, and the page that they refer
    is downloaded to get some statistics
    from them.
    The robot runs once a month, more or
    less, and visits the first 10 pages
    listed in every search engine, for a
    group of keywords.
    robot-history: To maintain a database of search engines,
    we needed an automated tool. That's why
    we began the creation of this robot.
    robot-environment: service
    modified-date: Mon, 13 Dec 1999 21:50:32 GMT
    modified-by: Daniel VilÓ

  • robot-id: pimptrain
    robot-name: Pimptrain.com's robot
    robot-cover-url: http://www.pimptrain.com/search.cgi
    robot-details-url: http://www.pimptrain.com/search.cgi
    robot-owner-name: Bryan Ankielewicz
    robot-owner-url: http://www.pimptrain.com
    robot-owner-email: webmaster@pimptrain.com
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix
    robot-availability: source;data
    robot-exclusion: yes
    robot-exclusion-useragent: Pimptrain
    robot-noindex: yes
    robot-host: pimtprain.com
    robot-from: *
    robot-useragent: Mozilla/4.0 (compatible: Pimptrain's robot)
    robot-language: perl5
    robot-description: Crawls remote sites as part of a search engine program
    robot-history: Implemented in 2001
    robot-environment: commercial
    modified-date: May 11, 2001
    modified-by: Bryan Ankielewicz

  • robot-id: pioneer
    robot-name: Pioneer
    robot-cover-url: http://sequent.uncfsu.edu/~micah/pioneer.html
    robot-details-url:
    robot-owner-name: Micah A. Williams
    robot-owner-url: http://sequent.uncfsu.edu/~micah/
    robot-owner-email: micah@sequent.uncfsu.edu
    robot-status:
    robot-purpose: indexing, statistics
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex:
    robot-host: *.uncfsu.edu or flyer.ncsc.org
    robot-from: yes
    robot-useragent: Pioneer
    robot-language: C.
    robot-description: Pioneer is part of an undergraduate research
    project.
    robot-history:
    robot-environment:
    modified-date: Mon Feb 5 02:49:32 1996.
    modified-by:

  • robot-id: pitkow
    robot-name: html_analyzer
    robot-cover-url:
    robot-details-url:
    robot-owner-name: James E. Pitkow
    robot-owner-url:
    robot-owner-email: pitkow@aries.colorado.edu
    robot-status:
    robot-purpose: maintainance
    robot-type:
    robot-platform:
    robot-availability:
    robot-exclusion:
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host:
    robot-from:
    robot-useragent:
    robot-language:
    robot-description: to check validity of Web servers. I'm not sure if it has
    ever been run remotely.
    robot-history:
    robot-environment:
    modified-date:
    modified-by:

  • robot-id: pjspider
    robot-name: Portal Juice Spider
    robot-cover-url: http://www.portaljuice.com
    robot-details-url: http://www.portaljuice.com/pjspider.html
    robot-owner-name: Nextopia Software Corporation
    robot-owner-url: http://www.portaljuice.com
    robot-owner-email: pjspider@portaljuice.com
    robot-status: active
    robot-purpose: indexing, statistics
    robot-type: standalone
    robot-platform: unix
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: pjspider
    robot-noindex: yes
    robot-host: *.portaljuice.com, *.nextopia.com
    robot-from: yes
    robot-useragent: PortalJuice.com/4.0
    robot-language: C/C++
    robot-description: Indexing web documents for Portal Juice vertical portal
    search engine
    robot-history: Indexing the web since 1998 for the purposes of offering our
    commerical Portal Juice search engine services.
    robot-environment: service
    modified-date: Wed Jun 23 17:00:00 EST 1999
    modified-by: pjspider@portaljuice.com

  • robot-id: pka
    robot-name: PGP Key Agent
    robot-cover-url: http://www.starnet.it/pgp
    robot-details-url:
    robot-owner-name: Massimiliano Pucciarelli
    robot-owner-url: http://www.starnet.it/puma
    robot-owner-email: puma@comm2000.it
    robot-status: Active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: UNIX, Windows NT
    robot-availability: none
    robot-exclusion: no
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host: salerno.starnet.it
    robot-from: yes
    robot-useragent: PGP-KA/1.2
    robot-language: Perl 5
    robot-description: This program search the pgp public key for the
    specified user.
    robot-history: Originated as a research project at Salerno
    University in 1995.
    robot-environment: Research
    modified-date: June 27 1996.
    modified-by: Massimiliano Pucciarelli

  • robot-id: plumtreewebaccessor
    robot-name: PlumtreeWebAccessor
    robot-cover-url:
    robot-details-url: http://www.plumtree.com/
    robot-owner-name: Joseph A. Stanko
    robot-owner-url:
    robot-owner-email: josephs@plumtree.com
    robot-status: development
    robot-purpose: indexing for the Plumtree Server
    robot-type: standalone
    robot-platform: windowsNT
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: PlumtreeWebAccessor
    robot-noindex: yes
    robot-host:
    robot-from: yes
    robot-useragent: PlumtreeWebAccessor/0.9
    robot-language: c++
    robot-description: The Plumtree Web Accessor is a component that
    customers can add to the
    Plumtree Server to index documents on the World Wide Web.
    robot-history:
    robot-environment: commercial
    modified-date: Thu, 17 Dec 1998
    modified-by: Joseph A. Stanko <josephs@plumtree.com>

  • robot-id: poppi
    robot-name: Poppi
    robot-cover-url: http://members.tripod.com/poppisearch
    robot-details-url: http://members.tripod.com/poppisearch
    robot-owner-name: Antonio Provenzano
    robot-owner-url: Antonio Provenzano
    robot-owner-email:
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix/linux
    robot-availability: none
    robot-exclusion:
    robot-exclusion-useragent:
    robot-noindex: yes
    robot-host:=20
    robot-from:
    robot-useragent: Poppi/1.0
    robot-language: C
    robot-description: Poppi is a crawler to index the web that runs weekly
    gathering and indexing hypertextual, multimedia and executable file
    formats
    robot-history: Created by Antonio Provenzano in the april of 2000, has
    been acquired from Tomi Officine Multimediali srl and it is next to
    release as service and commercial
    robot-environment: service
    modified-date: Mon, 22 May 2000 15:47:30 GMT
    modified-by: Antonio Provenzano

  • robot-id: portalb
    robot-name: PortalB Spider
    robot-cover-url: http://www.portalb.com/
    robot-details-url:
    robot-owner-name: PortalB Spider Bug List
    robot-owner-url:
    robot-owner-email: spider@portalb.com
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: PortalBSpider
    robot-noindex: yes
    robot-nofollow: yes
    robot-host: spider1.portalb.com, spider2.portalb.com, etc.
    robot-from: no
    robot-useragent: PortalBSpider/1.0 (spider@portalb.com)
    robot-language: C++
    robot-description: The PortalB Spider indexes selected sites for
    high-quality business information.
    robot-history:
    robot-environment: service

  • robot-id: psbot
    robot-name: psbot
    robot-cover-url: http://www.picsearch.com/
    robot-details-url: http://www.picsearch.com/bot.html
    robot-owner-name: picsearch AB
    robot-owner-url: http://www.picsearch.com/
    robot-owner-email: psbot@picsearch.com
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: Linux
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: psbot
    robot-noindex: yes
    robot-nofollow: yes
    robot-host: *.picsearch.com
    robot-from: yes
    robot-useragent: psbot/0.X (+http://www.picsearch.com/bot.html)
    robot-language: c, c++
    robot-description: Spider for www.picsearch.com
    robot-history: Developed and tested in 2000/2001
    robot-environment: commercial
    modified-date: Tue, 21 Aug 2001 10:55:38 CEST 2001
    modified-by: psbot@picsearch.com

  • robot-id: Puu
    robot-name: GetterroboPlus Puu
    robot-details-url: http://marunaka.homing.net/straight/getter/
    robot-cover-url: http://marunaka.homing.net/straight/
    robot-owner-name: marunaka
    robot-owner-url: http://marunaka.homing.net
    robot-owner-email: marunaka@homing.net
    robot-status: active: robot actively in use
    robot-purpose: Purpose of the robot. One or more of:
    - gathering: gather data of original standerd TAG for Puu contains the
    information of the sites registered my Search Engin.
    - maintenance: link validation
    robot-type: standalone
    robot-platform: unix
    robot-availability: none
    robot-exclusion: yes (Puu patrols only registered url in my Search Engine)
    robot-exclusion-useragent: Getterrobo-Plus
    robot-noindex: no
    robot-host: straight FLASH!! Getterrobo-Plus, *.homing.net
    robot-from: yes
    robot-useragent: straight FLASH!! GetterroboPlus 1.5
    robot-language: perl5
    robot-description:
    Puu robot is used to gater data from registered site in Search Engin
    "straight FLASH!!" for building anouncement page of state of renewal of
    registered site in "straight FLASH!!".
    Robot runs everyday.
    robot-history:
    This robot patorols based registered sites in Search Engin "straight FLASH!!"
    robot-environment: hobby
    modified-date: Fri, 26 Jun 1998

  • robot-id: python
    robot-name: The Python Robot
    robot-cover-url: http://www.python.org/
    robot-details-url:
    robot-owner-name: Guido van Rossum
    robot-owner-url: http://www.python.org/~guido/
    robot-owner-email: guido@python.org
    robot-status: retired
    robot-purpose:
    robot-type:
    robot-platform:
    robot-availability: none
    robot-exclusion:
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host:
    robot-from:
    robot-useragent:
    robot-language:
    robot-description:
    robot-history:
    robot-environment:
    modified-date:
    modified-by:

  • robot-id: raven
    robot-name: Raven Search
    robot-cover-url: http://ravensearch.tripod.com
    robot-details-url: http://ravensearch.tripod.com
    robot-owner-name: Raven Group
    robot-owner-url: http://ravensearch.tripod.com
    robot-owner-email: ravensearch@hotmail.com
    robot-status: Development: robot under development
    robot-purpose: Indexing: gather content for commercial query engine.
    robot-type: Standalone: a separate program
    robot-platform: Unix, Windows98, WindowsNT, Windows2000
    robot-availability: None
    robot-exclusion: Yes
    robot-exclusion-useragent: Raven
    robot-noindex: Yes
    robot-nofollow: Yes
    robot-host: 192.168.1.*
    robot-from: Yes
    robot-useragent: Raven-v2
    robot-language: Perl-5
    robot-description: Raven was written for the express purpose of indexing the web.
    It can parallel process hundreds of URLS's at a time. It runs on a sporadic basis
    as testing continues. It is really several programs running concurrently.
    It takes four computers to run Raven Search. Scalable in sets of four.
    robot-history: This robot is new. First active on March 25, 2000.
    robot-environment: Commercial: is a commercial product. Possibly GNU later ;-)
    modified-date: Fri, 25 Mar 2000 17:28:52 GMT
    modified-by: Raven Group

  • robot-id: rbse
    robot-name: RBSE Spider
    robot-cover-url: http://rbse.jsc.nasa.gov/eichmann/urlsearch.html
    robot-details-url:
    robot-owner-name: David Eichmann
    robot-owner-url: http://rbse.jsc.nasa.gov/eichmann/home.html
    robot-owner-email: eichmann@rbse.jsc.nasa.gov
    robot-status: active
    robot-purpose: indexing, statistics
    robot-type:
    robot-platform:
    robot-availability:
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex:
    robot-host: rbse.jsc.nasa.gov (192.88.42.10)
    robot-from:
    robot-useragent:
    robot-language: C, oracle, wais
    robot-description: Developed and operated as part of the NASA-funded Repository
    Based Software Engineering Program at the Research Institute
    for Computing and Information Systems, University of Houston
    - Clear Lake.
    robot-history:
    robot-environment:
    modified-date: Thu May 18 04:47:02 1995
    modified-by:

  • robot-id: resumerobot
    robot-name: Resume Robot
    robot-cover-url: http://www.onramp.net/proquest/resume/robot/robot.html
    robot-details-url:
    robot-owner-name: James Stakelum
    robot-owner-url: http://www.onramp.net/proquest/resume/java/resume.html
    robot-owner-email: proquest@onramp.net
    robot-status:
    robot-purpose: indexing.
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex:
    robot-host:
    robot-from: yes
    robot-useragent: Resume Robot
    robot-language: C++.
    robot-description:
    robot-history:
    robot-environment:
    modified-date: Tue Mar 12 15:52:25 1996.
    modified-by:

  • robot-id: rhcs
    robot-name: RoadHouse Crawling System
    robot-cover-url: http://stage.perceval.be (under developpement)
    robot-details-url:
    robot-owner-name: Gregoire Welraeds, Emmanuel Bergmans
    robot-owner-url: http://www.perceval.be
    robot-owner-email: helpdesk@perceval.be
    robot-status: development
    robot-purpose1: indexing
    robot-purpose2: maintenance
    robot-purpose3: statistics
    robot-type: standalone
    robot-platform1: unix (FreeBSD & Linux)
    robot-availability: none
    robot-exclusion: no (under development)
    robot-exclusion-useragent: RHCS
    robot-noindex: no (under development)
    robot-host: stage.perceval.be
    robot-from: no
    robot-useragent: RHCS/1.0a
    robot-language: c
    robot-description: robot used tp build the database for the RoadHouse search service project operated by Perceval
    robot-history: The need of this robot find its roots in the actual RoadHouse directory not maintenained since 1997
    robot-environment: service
    modified-date: Fri, 26 Feb 1999 12:00:00 GMT
    modified-by: Gregoire Welraeds

  • robot-id: roadrunner
    robot-name: Road Runner: The ImageScape Robot
    robot-owner-name: LIM Group
    robot-owner-email: lim@cs.leidenuniv.nl
    robot-status: development/active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: UNIX
    robot-exclusion: yes
    robot-exclusion-useragent: roadrunner
    robot-useragent: Road Runner: ImageScape Robot (lim@cs.leidenuniv.nl)
    robot-language: C, perl5
    robot-description: Create Image/Text index for WWW
    robot-history: ImageScape Project
    robot-environment: commercial service
    modified-date: Dec. 1st, 1996

  • robot-id: robbie
    robot-name: Robbie the Robot
    robot-cover-url:
    robot-details-url:
    robot-owner-name: Robert H. Pollack
    robot-owner-url:
    robot-owner-email: robert.h.pollack@lmco.com
    robot-status: development
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix, windows95, windowsNT
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: Robbie
    robot-noindex: no
    robot-host: *.lmco.com
    robot-from: yes
    robot-useragent: Robbie/0.1
    robot-language: java
    robot-description: Used to define document collections for the DISCO system.
    Robbie is still under development and runs several
    times a day, but usually only for ten minutes or so.
    Sites are visited in the order in which references
    are found, but no host is visited more than once in
    any two-minute period.
    robot-history: The DISCO system is a resource-discovery component in
    the OLLA system, which is a prototype system, developed
    under DARPA funding, to support computer-based education
    and training.
    robot-environment: research
    modified-date: Wed, 5 Feb 1997 19:00:00 GMT
    modified-by:

  • robot-id: robi
    robot-name: ComputingSite Robi/1.0
    robot-cover-url: http://www.computingsite.com/robi/
    robot-details-url: http://www.computingsite.com/robi/
    robot-owner-name: Tecor Communications S.L.
    robot-owner-url: http://www.tecor.com/
    robot-owner-email: robi@computingsite.com
    robot-status: Active
    robot-purpose: indexing,maintenance
    robot-type: standalone
    robot-platform: UNIX
    robot-availability:
    robot-exclusion: yes
    robot-exclusion-useragent: robi
    robot-noindex: no
    robot-host: robi.computingsite.com
    robot-from:
    robot-useragent: ComputingSite Robi/1.0 (robi@computingsite.com)
    robot-language: python
    robot-description: Intelligent agent used to build the ComputingSite Search
    Directory.
    robot-history: It was born on August 1997.
    robot-environment: service
    modified-date: Wed, 13 May 1998 17:28:52 GMT
    modified-by: Jorge Alegre

  • robot-id: robocrawl
    robot-name: RoboCrawl Spider
    robot-cover-url: http://www.canadiancontent.net/
    robot-details-url: http://www.canadiancontent.net/corp/spider.html
    robot-owner-name: Canadian Content Interactive Media
    robot-owner-url: http://www.canadiancontent.net/
    robot-owner-email: staff@canadiancontent.net
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: linux
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: RoboCrawl
    robot-noindex: yes
    robot-host: ncc.canadiancontent.net, ncc.air-net.no, canadiancontent.net, spider.canadiancontent.net
    robot-from: no
    robot-useragent: RoboCrawl (http://www.canadiancontent.net)
    robot-language: C and C++
    robot-description: The Canadian Content robot indexes for it's search database.
    robot-history: Our robot is a newer project at Canadian Content.
    robot-environment: service
    modified-date: July 30th, 2001
    modified-by: Christopher Walsh and Adam Rutter

  • robot-id: robofox
    robot-name: RoboFox
    robot-cover-url:
    robot-details-url:
    robot-owner-name: Ian Hicks
    robot-owner-url:
    robot-owner-email: robo_fox@hotmail.com
    robot-status: development
    robot-purpose: site download
    robot-type: standalone
    robot-platform: windows9x, windowsme, windowsNT4, windows2000
    robot-availability: none
    robot-exclusion: no
    robot-exclusion-useragent: robofox
    robot-noindex: no
    robot-host: *
    robot-from: no
    robot-useragent: Robofox v2.0
    robot-language: Visual FoxPro
    robot-description: scheduled utility to download and database a domain
    robot-history:
    robot-environment: service
    modified-date: Tue, 6 Mar 2001 02:15:00 GMT
    modified-by: Ian Hicks

  • robot-id: robozilla
    robot-name: Robozilla
    robot-cover-url: http://dmoz.org/
    robot-details-url: http://www.dmoz.org/newsletter/2000Aug/robo.html
    robot-owner-name: "Rob O'Zilla"
    robot-owner-url: http://dmoz.org/profiles/robozilla.html
    robot-owner-email: robozilla@dmozed.org
    robot-status: active
    robot-purpose: maintenance
    robot-type: standalone
    robot-availability: none
    robot-exclusion: no
    robot-noindex: no
    robot-host: directory.mozilla.org
    robot-useragent: Robozilla/1.0
    robot-description: Robozilla visits all the links within the Open Directory
    periodically, marking the ones that return errors for review.
    robot-environment: service

  • robot-id: roverbot
    robot-name: Roverbot
    robot-cover-url: http://www.roverbot.com/
    robot-details-url:
    robot-owner-name: GlobalMedia Design (Andrew Cowan & Brian
    Clark)
    robot-owner-url: http://www.radzone.org/gmd/
    robot-owner-email: gmd@spyder.net
    robot-status:
    robot-purpose: indexing
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex:
    robot-host: roverbot.com
    robot-from: yes
    robot-useragent: Roverbot
    robot-language: perl5
    robot-description: Targeted email gatherer utilizing user-defined seed points
    and interacting with both the webserver and MX servers of
    remote sites.
    robot-history:
    robot-environment:
    modified-date: Tue Jun 18 19:16:31 1996.
    modified-by:

  • robot-id: rules
    robot-name: RuLeS
    robot-cover-url: http://www.rules.be
    robot-details-url: http://www.rules.be
    robot-owner-name: Marc Wils
    robot-owner-url: http://www.rules.be
    robot-owner-email: marc@rules.be
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: yes
    robot-noindex: yes
    robot-host: www.rules.be
    robot-from: yes
    robot-useragent: RuLeS/1.0 libwww/4.0
    robot-language: Dutch (Nederlands)
    robot-description:
    robot-history: none
    robot-environment: hobby
    modified-date: Sun, 8 Apr 2001 13:06:54 CET
    modified-by: Marc Wils

  • robot-id: safetynetrobot
    robot-name: SafetyNet Robot
    robot-cover-url: http://www.urlabs.com/
    robot-details-url:
    robot-owner-name: Michael L. Nelson
    robot-owner-url: http://www.urlabs.com/
    robot-owner-email: m.l.nelson@urlabs.com
    robot-status:
    robot-purpose: indexing.
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: no.
    robot-exclusion-useragent:
    robot-noindex:
    robot-host: *.urlabs.com
    robot-from: yes
    robot-useragent: SafetyNet Robot 0.1,
    robot-language: Perl 5
    robot-description: Finds URLs for K-12 content management.
    robot-history:
    robot-environment:
    modified-date: Sat Mar 23 20:12:39 1996.
    modified-by:

  • robot-id: scooter
    robot-name: Scooter
    robot-cover-url: http://www.altavista.com/
    robot-details-url: http://www.altavista.com/av/content/addurl.htm
    robot-owner-name: AltaVista
    robot-owner-url: http://www.altavista.com/
    robot-owner-email: scooter@pa.dec.com
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: Scooter
    robot-noindex: yes
    robot-host: *.av.pa-x.dec.com
    robot-from: yes
    robot-useragent: Scooter/2.0 G.R.A.B. V1.1.0
    robot-language: c
    robot-description: Scooter is AltaVista's prime index agent.
    robot-history: Version 2 of Scooter/1.0 developed by Louis Monier of WRL.
    robot-environment: service
    modified-date: Wed, 13 Jan 1999 17:18:59 GMT
    modified-by: steves@avs.dec.com

  • robot-id: search_au
    robot-name: Search.Aus-AU.COM
    robot-details-url: http://Search.Aus-AU.COM/
    robot-cover-url: http://Search.Aus-AU.COM/
    robot-owner-name: Dez Blanchfield
    robot-owner-url: not currently available
    robot-owner-email: dez@geko.com
    robot-status: - development: robot under development
    robot-purpose: - indexing: gather content for an indexing service
    robot-type: - standalone: a separate program
    robot-platform: - mac - unix - windows95 - windowsNT
    robot-availability: - none
    robot-exclusion: yes
    robot-exclusion-useragent: Search-AU
    robot-noindex: yes
    robot-host: Search.Aus-AU.COM, 203.55.124.29, 203.2.239.29
    robot-from: no
    robot-useragent: not available
    robot-language: c, perl, sql
    robot-description: Search-AU is a development tool I have built
    to investigate the power of a search engine and web crawler
    to give me access to a database of web content ( html / url's )
    and address's etc from which I hope to build more accurate stats
    about the .au zone's web content.
    the robot started crawling from http://www.geko.net.au/ on
    march 1st, 1998 and after nine days had 70mb of compressed ascii
    in a database to work with. i hope to run a refresh of the crawl
    every month initially, and soon every week bandwidth and cpu allowing.
    if the project warrants further development, i will turn it into
    an australian ( .au ) zone search engine and make it commercially
    available for advertising to cover the costs which are starting
    to mount up. --dez (980313 - black friday!)
    robot-environment: - hobby: written as a hobby
    modified-date: Fri Mar 13 10:03:32 EST 1998

  • robot-id: search-info
    robot-name: Sleek
    robot-cover-url: http://search-info.com/
    robot-details-url:
    robot-owner-name: Lawrence R. Hughes, Sr.
    robot-owner-url: http://hughesnet.net/
    robot-owner-email: lawrence.hughes@search-info.com
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: Unix, Linux, Windows
    robot-availability: source;data
    robot-exclusion: yes
    robot-exclusion-useragent: robots.txt
    robot-noindex: yes
    robot-host: yes
    robot-from: yes
    robot-useragent: Mozilla/4.0 (Sleek Spider/1.2)
    robot-language: perl5
    robot-description: Crawls remote sites and performs link popularity checks before inclusion.
    robot-history: HyBrid of the FDSE Crawler by: Zoltan Milosevic Current Mods: started 1/10/2002
    robot-environment: hobby
    modified-date: Mon, 14 Jan 2002 08:02:23 GMT
    modified-by: Lawrence R. Hughes, Sr.

  • robot-id: searchprocess
    robot-name: SearchProcess
    robot-cover-url: http://www.searchprocess.com
    robot-details-url: http://www.intelligence-process.com
    robot-owner-name: Mannina Bruno
    robot-owner-url: http://www.intelligence-process.com
    robot-owner-email: bruno@intelligence-process.com
    robot-status: active
    robot-purpose: Statistic
    robot-type: browser
    robot-platform: linux
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: searchprocess
    robot-noindex: yes
    robot-host: searchprocess.com
    robot-from: yes
    robot-useragent: searchprocess/0.9
    robot-language: perl
    robot-description: An intelligent Agent Online. SearchProcess is used to
    provide structured information to user.
    robot-history: This is the son of Auresys
    robot-environment: Service freeware
    modified-date: Thus, 22 Dec 1999
    modified-by: Mannina Bruno

  • robot-id: senrigan
    robot-name: Senrigan
    robot-cover-url: http://www.info.waseda.ac.jp/search-e.html
    robot-details-url:
    robot-owner-name: TAMURA Kent
    robot-owner-url: http://www.info.waseda.ac.jp/muraoka/members/kent/
    robot-owner-email: kent@muraoka.info.waseda.ac.jp
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: Java
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent:Senrigan
    robot-noindex: yes
    robot-host: aniki.olu.info.waseda.ac.jp
    robot-from: yes
    robot-useragent: Senrigan/xxxxxx
    robot-language: Java
    robot-description: This robot now gets HTMLs from only jp domain.
    robot-history: It has been running since Dec 1994
    robot-environment: research
    modified-date: Mon Jul 1 07:30:00 GMT 1996
    modified-by: TAMURA Kent

  • robot-id: sgscout
    robot-name: SG-Scout
    robot-cover-url: http://www-swiss.ai.mit.edu/~ptbb/SG-Scout/SG-Scout.html
    robot-details-url:
    robot-owner-name: Peter Beebee
    robot-owner-url: http://www-swiss.ai.mit.edu/~ptbb/personal/index.html
    robot-owner-email: ptbb@ai.mit.edu, beebee@parc.xerox.com
    robot-status: active
    robot-purpose: indexing
    robot-type:
    robot-platform:
    robot-availability:
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host: beta.xerox.com
    robot-from: yes
    robot-useragent: SG-Scout
    robot-language:
    robot-description: Does a "server-oriented" breadth-first search in a
    round-robin fashion, with multiple processes.
    robot-history: Run since 27 June 1994, for an internal XEROX research
    project
    robot-environment:
    modified-date:
    modified-by:

  • robot-id:shaggy
    robot-name:ShagSeeker
    robot-cover-url:http://www.shagseek.com
    robot-details-url:
    robot-owner-name:Joseph Reynolds
    robot-owner-url:http://www.shagseek.com
    robot-owner-email:joe.reynolds@shagseek.com
    robot-status:active
    robot-purpose:indexing
    robot-type:standalone
    robot-platform:unix
    robot-availability:data
    robot-exclusion:yes
    robot-exclusion-useragent:Shagseeker
    robot-noindex:yes
    robot-host:shagseek.com
    robot-from:
    robot-useragent:Shagseeker at http://www.shagseek.com /1.0
    robot-language:perl5
    robot-description:Shagseeker is the gatherer for the Shagseek.com search
    engine and goes out weekly.
    robot-history:none yet
    robot-environment:service
    modified-date:Mon 17 Jan 2000 10:00:00 EST
    modified-by:Joseph Reynolds

  • robot-id: shaihulud
    robot-name: Shai'Hulud
    robot-cover-url:
    robot-details-url:
    robot-owner-name: Dimitri Khaoustov
    robot-owner-url:
    robot-owner-email: shawdow@usa.net
    robot-status: active
    robot-purpose: mirroring
    robot-type: standalone
    robot-platform: unix
    robot-availability: source
    robot-exclusion: no
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host: *.rdtex.ru
    robot-from:
    robot-useragent: Shai'Hulud
    robot-language: C
    robot-description: Used to build mirrors for internal use
    robot-history: This robot finds its roots in a research project at RDTeX
    Perspective Projects Group in 1996
    robot-environment: research
    modified-date: Mon, 5 Aug 1996 14:35:08 GMT
    modified-by: Dimitri Khaoustov

  • robot-id: sift
    robot-name: Sift
    robot-cover-url: http://www.worthy.com/
    robot-details-url: http://www.worthy.com/
    robot-owner-name: Bob Worthy
    robot-owner-url: http://www.worthy.com/~bworthy
    robot-owner-email: bworthy@worthy.com
    robot-status: development, active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix
    robot-availability: data
    robot-exclusion: yes
    robot-exclusion-useragent: sift
    robot-noindex: yes
    robot-host: www.worthy.com
    robot-from:
    robot-useragent: libwww-perl-5.41
    robot-language: perl
    robot-description: Subject directed (via key phrase list) indexing.
    robot-history: Libwww of course, implementation using MySQL August, 1999.
    Indexing Search and Rescue sites.
    robot-environment: research, service
    modified-date: Sat, 16 Oct 1999 19:40:00 GMT
    modified-by: Bob Worthy

  • robot-id: simbot
    robot-name: Simmany Robot Ver1.0
    robot-cover-url: http://simmany.hnc.net/
    robot-details-url: http://simmany.hnc.net/irman1.html
    robot-owner-name: Youngsik, Lee(@L?5=D)
    robot-owner-url:
    robot-owner-email: ailove@hnc.co.kr
    robot-status: development & active
    robot-purpose: indexing, maintenance, statistics
    robot-type: standalone
    robot-platform: unix
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: SimBot
    robot-noindex: no
    robot-host: sansam.hnc.net
    robot-from: no
    robot-useragent: SimBot/1.0
    robot-language: C
    robot-description: The Simmany Robot is used to build the Map(DB) for
    the simmany service operated by HNC(Hangul & Computer Co., Ltd.). The
    robot runs weekly, and visits sites that have a useful korean
    information in a defined order.
    robot-history: This robot is a part of simmany service and simmini
    products. The simmini is the Web products that make use of the indexing
    and retrieving modules of simmany.
    robot-environment: service, commercial
    modified-date: Thu, 19 Sep 1996 07:02:26 GMT
    modified-by: Youngsik, Lee

  • robot-id: site-valet
    robot-name: Site Valet
    robot-cover-url: http://valet.webthing.com/
    robot-details-url: http://valet.webthing.com/
    robot-owner-name: Nick Kew
    robot-owner-url:
    robot-owner-email: nick@webthing.com
    robot-status: active
    robot-purpose: maintenance
    robot-type: standalone
    robot-platform: unix
    robot-availability: data
    robot-exclusion: yes
    robot-exclusion-useragent: Site Valet
    robot-noindex: no
    robot-host: valet.webthing.com,valet.*
    robot-from: yes
    robot-useragent: Site Valet
    robot-language: perl
    robot-description: a deluxe site monitoring and analysis service
    robot-history: builds on cg-eye, the WDG Validator, and the Link Valet
    robot-environment: service
    modified-date: Tue, 27 June 2000
    modified-by: nick@webthing.com

  • robot-id: sitetech
    robot-name: SiteTech-Rover
    robot-cover-url: http://www.sitetech.com/
    robot-details-url:
    robot-owner-name: Anil Peres-da-Silva
    robot-owner-url: http://www.sitetech.com
    robot-owner-email: adasilva@sitetech.com
    robot-status:
    robot-purpose: indexing
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex:
    robot-host:
    robot-from: yes
    robot-useragent: SiteTech-Rover
    robot-language: C++.
    robot-description: Originated as part of a suite of Internet Products to
    organize, search & navigate Intranet sites and to validate
    links in HTML documents.
    robot-history: This robot originally went by the name of LiberTech-Rover
    robot-environment:
    modified-date: Fri Aug 9 17:06:56 1996.
    modified-by: Anil Peres-da-Silva

  • robot-id: skymob
    robot-name: Skymob.com
    robot-cover-url: http://www.skymob.com/
    robot-details-url: http://www.skymob.com/about.html
    robot-owner-name: Have IT Now Limited.
    robot-owner-url: http://www.skymob.com/
    robot-owner-email: searchmaster@skymob.com
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: skymob
    robot-noindex: no
    robot-host: www.skymob.com
    robot-from: searchmaster@skymob.com
    robot-useragent: aWapClient
    robot-language: c++
    robot-description: WAP content Crawler.
    robot-history: new
    robot-environment: service
    modified-date: Thu Sep 6 17:50:32 BST 2001
    modified-by: Owen Lydiard

  • robot-id:slcrawler
    robot-name:SLCrawler
    robot-cover-url:
    robot-details-url:
    robot-owner-name:Inxight Software
    robot-owner-url:http://www.inxight.com
    robot-owner-email:kng@inxight.com
    robot-status:active
    robot-purpose:To build the site map.
    robot-type:standalone
    robot-platform:windows, windows95, windowsNT
    robot-availability:none
    robot-exclusion:yes
    robot-exclusion-useragent:SLCrawler/2.0
    robot-noindex:no
    robot-host:n/a
    robot-from:
    robot-useragent:SLCrawler
    robot-language:Java
    robot-description:To build the site map.
    robot-history:It is SLCrawler to crawl html page on Internet.
    robot-environment: commercial: is a commercial product
    modified-date:Nov. 15, 2000
    modified-by:Karen Ng

  • robot-id: slurp
    robot-name: Inktomi Slurp
    robot-cover-url: http://www.inktomi.com/
    robot-details-url: http://www.inktomi.com/slurp.html
    robot-owner-name: Inktomi Corporation
    robot-owner-url: http://www.inktomi.com/
    robot-owner-email: slurp@inktomi.com
    robot-status: active
    robot-purpose: indexing, statistics
    robot-type: standalone
    robot-platform: unix
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: slurp
    robot-noindex: yes
    robot-host: *.inktomi.com
    robot-from: yes
    robot-useragent: Slurp/2.0
    robot-language: C/C++
    robot-description: Indexing documents for the HotBot search engine
    (www.hotbot.com), collecting Web statistics
    robot-history: Switch from Slurp/1.0 to Slurp/2.0 November 1996
    robot-environment: service
    modified-date: Fri Feb 28 13:57:43 PST 1997
    modified-by: slurp@inktomi.com

  • robot-id: smartspider
    robot-name: Smart Spider
    robot-cover-url: http://www.travel-finder.com
    robot-details-url: http://www.engsoftware.com/robots.htm
    robot-owner-name: Ken Wadland
    robot-owner-url: http://www.engsoftware.com
    robot-owner-email: ken@engsoftware.com
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: windows95, windowsNT
    robot-availability: data, binary, source
    robot-exclusion: Yes
    robot-exclusion-useragent: ESI
    robot-noindex: Yes
    robot-host: 207.16.241.*
    robot-from: Yes
    robot-useragent: ESISmartSpider/2.0
    robot-language: C++
    robot-description: Classifies sites using a Knowledge Base. Robot collects
    web pages which are then parsed and feed to the Knowledge Base. The
    Knowledge Base classifies the sites into any of hundreds of categories
    based on the vocabulary used. Currently used by: //www.travel-finder.com
    (Travel and Tourist Info) and //www.golightway.com (Christian Sites).
    Several options exist to control whether sites are discovered and/or
    classified fully automatically, full manually or somewhere in between.
    robot-history: Feb '96 -- Product design begun. May '96 -- First data
    results published by Travel-Finder. Oct '96 -- Generalized and announced
    and a product for other sites. Jan '97 -- First data results published by
    GoLightWay.
    robot-environment: service, commercial
    modified-date: Mon, 13 Jan 1997 10:41:00 EST
    modified-by: Ken Wadland

  • robot-id: snooper
    robot-name: Snooper
    robot-cover-url: http://darsun.sit.qc.ca
    robot-details-url:
    robot-owner-name: Isabelle A. Melnick
    robot-owner-url:
    robot-owner-email: melnicki@sit.ca
    robot-status: part under development and part active
    robot-purpose:
    robot-type:
    robot-platform:
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: snooper
    robot-noindex:
    robot-host:
    robot-from:
    robot-useragent: Snooper/b97_01
    robot-language:
    robot-description:
    robot-history:
    robot-environment:
    modified-date:
    modified-by:

  • robot-id: solbot
    robot-name: Solbot
    robot-cover-url: http://kvasir.sol.no/
    robot-details-url:
    robot-owner-name: Frank Tore Johansen
    robot-owner-url:
    robot-owner-email: ftj@sys.sol.no
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: solbot
    robot-noindex: yes
    robot-host: robot*.sol.no
    robot-from:
    robot-useragent: Solbot/1.0 LWP/5.07
    robot-language: perl, c
    robot-description: Builds data for the Kvasir search service. Only searches
    sites which ends with one of the following domains: "no", "se", "dk", "is", "fi"
    robot-history: This robot is the result of a 3 years old late night hack when
    the Verity robot (of that time) was unable to index sites with iso8859
    characters (in URL and other places), and we just _had_ to have something up and going the next day...
    robot-environment: service
    modified-date: Tue Apr 7 16:25:05 MET DST 1998
    modified-by: Frank Tore Johansen <ftj@sys.sol.no>

  • robot-id:speedy
    robot-name:Speedy Spider
    robot-cover-url:http://www.entireweb.com/
    robot-details-url:http://www.entireweb.com/speedy.html
    robot-owner-name:WorldLight.com AB
    robot-owner-url:http://www.worldlight.com
    robot-owner-email:speedy@worldlight.com
    robot-status:active
    robot-purpose:indexing
    robot-type:standalone
    robot-platform:Windows
    robot-availability:none
    robot-exclusion:yes
    robot-exclusion-useragent:speedy
    robot-noindex:yes
    robot-host:router-00.sverige.net, 193.15.210.29, *.entireweb.com,
    *.worldlight.com
    robot-from:yes
    robot-useragent:Speedy Spider ( http://www.entireweb.com/speedy.html )
    robot-language:C, C++
    robot-description:Speedy Spider is used to build the database
    for the Entireweb.com search service operated by WorldLight.com
    (part of WorldLight Network).
    The robot runs constantly, and visits sites in a random order.
    robot-history:This robot is a part of the highly advanced search engine
    Entireweb.com, that was developed in Halmstad, Sweden during 1998-2000.
    robot-environment:service, commercial
    modified-date:Mon, 17 July 2000 11:05:03 GMT
    modified-by:Marcus Andersson

  • robot-id: spider_monkey
    robot-name: spider_monkey
    robot-cover-url: http://www.mobrien.com/add_site.html
    robot-details-url: http://www.mobrien.com/add_site.html
    robot-owner-name: MPRM Group Limited
    robot-owner-url: http://www.mobrien.com
    robot-owner-email: mprm@ionsys.com
    robot-status: robot actively in use
    robot-purpose: gather content for a free indexing service
    robot-type: FDSE robot
    robot-platform: unix
    robot-availability: bulk data gathered by robot available
    robot-exclusion: yes
    robot-exclusion-useragent: spider_monkey
    robot-noindex: yes
    robot-host: snowball.ionsys.com
    robot-from: yes
    robot-useragent: mouse.house/7.1
    robot-language: perl5
    robot-description: Robot runs every 30 days for a full index and weekly =
    on a list of accumulated visitor requests
    robot-history: This robot is under development and currently active
    robot-environment: written as an employee / guest service
    modified-date: Mon, 22 May 2000 12:28:52 GMT
    modified-by: MPRM Group Limited

  • robot-id: spiderbot
    robot-name: SpiderBot
    robot-cover-url: http://pisuerga.inf.ubu.es/lsi/Docencia/TFC/ITIG/icruzadn/cover.htm
    robot-details-url: http://pisuerga.inf.ubu.es/lsi/Docencia/TFC/ITIG/icruzadn/details.htm
    robot-owner-name: Ignacio Cruzado Nu.o
    robot-owner-url: http://pisuerga.inf.ubu.es/lsi/Docencia/TFC/ITIG/icruzadn/icruzadn.htm
    robot-owner-email: spidrboticruzado@solaria.emp.ubu.es
    robot-status: active
    robot-purpose: indexing, mirroring
    robot-type: standalone, browser
    robot-platform: unix, windows, windows95, windowsNT
    robot-availability: source, binary, data
    robot-exclusion: yes
    robot-exclusion-useragent: SpiderBot/1.0
    robot-noindex: yes
    robot-host: *
    robot-from: yes
    robot-useragent: SpiderBot/1.0
    robot-language: C++, Tcl
    robot-description: Recovers Web Pages and saves them on your hard disk. Then it reindexes them.
    robot-history: This Robot belongs to Ignacio Cruzado Nu.o End of Studies Thesis "Recuperador p.ginas Web", to get the titulation of "Management Tecnical Informatics Engineer" in the for the Burgos University in Spain.
    robot-environment: research
    modified-date: Sun, 27 Jun 1999 09:00:00 GMT
    modified-by: Ignacio Cruzado Nu.o

  • robot-id: spiderline
    robot-name: Spiderline Crawler
    robot-cover-url: http://www.spiderline.com/
    robot-details-url: http://www.spiderline.com/
    robot-owner-name: Benjamin Benson
    robot-owner-url: http://www.spiderline.com/
    robot-owner-email: ben@spiderline.com
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix
    robot-availability: free and commercial services
    robot-exclusion: yes
    robot-exclusion-useragent: spiderline
    robot-noindex: yes
    robot-host: *.spiderline.com, *.spiderline.org
    robot-from: no
    robot-useragent: spiderline/3.1.3
    robot-language: c, c++
    robot-description:
    robot-history: Developed for Spiderline.com, launched in 2001.
    robot-environment: service
    modified-date: Wed, 21 Feb 2001 03:36:39 GMT
    modified-by: Benjamin Benson

  • robot-id:spiderman
    robot-name:SpiderMan
    robot-cover-url:http://www.comp.nus.edu.sg/~leunghok
    robot-details-url:http://www.comp.nus.edu.sg/~leunghok/honproj.html
    robot-owner-name:Leung Hok Peng , The School Of Computing Nus , Singapore
    robot-owner-url:http://www.comp.nus.edu.sg/~leunghok
    robot-owner-email:leunghok@comp.nus.edu.sg
    robot-status:development & active
    robot-purpose:user searching using IR technique
    robot-type:stand alone
    robot-platform:Java 1.2
    robot-availability:binary&source
    robot-exclusion:no
    robot-exclusion-useragent:nil
    robot-noindex:no
    robot-host:NA
    robot-from:NA
    robot-useragent:SpiderMan 1.0
    robot-language:java
    robot-description:It is used for any user to search the web given a query string
    robot-history:Originated from The Center for Natural Product Research and The
    School of computing National University Of Singapore
    robot-environment:research
    modified-date:08/08/1999
    modified-by:Leung Hok Peng and Dr Hsu Wynne

  • robot-id: spiderview
    robot-name: SpiderView(tm)
    robot-cover-url: http://www.northernwebs.com/set/spider_view.html
    robot-details-url: http://www.northernwebs.com/set/spider_sales.html
    robot-owner-name: Northern Webs
    robot-owner-url: http://www.northernwebs.com
    robot-owner-email: webmaster@northernwebs.com
    robot-status: active
    robot-purpose: maintenance
    robot-type: standalone
    robot-platform: unix, nt
    robot-availability: source
    robot-exclusion: no
    robot-exclusion-useragent:
    robot-noindex:
    robot-host: bobmin.quad2.iuinc.com, *
    robot-from: No
    robot-useragent: Mozilla/4.0 (compatible; SpiderView 1.0;unix)
    robot-language: perl
    robot-description: SpiderView is a server based program which can spider
    a webpage, testing the links found on the page, evaluating your server
    and its performance.
    robot-history: This is an offshoot http retrieval program based on our
    Medibot software.
    robot-environment: commercial
    modified-date:
    modified-by:

  • robot-id: spry
    robot-name: Spry Wizard Robot
    robot-cover-url: http://www.spry.com/wizard/index.html
    robot-details-url:
    robot-owner-name: spry
    robot-owner-url: ttp://www.spry.com/index.html
    robot-owner-email: info@spry.com
    robot-status:
    robot-purpose: indexing
    robot-type:
    robot-platform:
    robot-availability:
    robot-exclusion:
    robot-exclusion-useragent:
    robot-noindex:
    robot-host: wizard.spry.com or tiger.spry.com
    robot-from: no
    robot-useragent: no
    robot-language:
    robot-description: Its purpose is to generate a Resource Discovery database
    Spry is refusing to give any comments about this
    robot
    robot-history:
    robot-environment:
    modified-date: Tue Jul 11 09:29:45 GMT 1995
    modified-by:

  • robot-id: ssearcher
    robot-name: Site Searcher
    robot-cover-url: www.satacoy.com
    robot-details-url: www.satacoy.com
    robot-owner-name: Zackware
    robot-owner-url: www.satacoy.com
    robot-owner-email: zackware@hotmail.com
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: winows95, windows98, windowsNT
    robot-availability: binary
    robot-exclusion: no
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host: *
    robot-from: no
    robot-useragent: ssearcher100
    robot-language: C++
    robot-description: Site Searcher scans web sites for specific file types.
    (JPG, MP3, MPG, etc)
    robot-history: Released 4/4/1999
    robot-environment: hobby
    modified-date: 04/26/1999

  • robot-id: suke
    robot-name: Suke
    robot-cover-url: http://www.kensaku.org/
    robot-details-url: http://www.kensaku.org/
    robot-owner-name: Yosuke Kuroda
    robot-owner-url: http://www.kensaku.org/yk/
    robot-owner-email: robot@kensaku.org
    robot-status: development
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: FreeBSD3.*
    robot-availability: source
    robot-exclusion: yes
    robot-exclusion-useragent: suke
    robot-noindex: no
    robot-host: *
    robot-from: yes
    robot-useragent: suke/*.*
    robot-language: c
    robot-description: This robot visits mainly sites in japan.
    robot-history: since 1999
    robot-environment: service

  • robot-id: suntek
    robot-name: suntek search engine
    robot-cover-url: http://www.portal.com.hk/
    robot-details-url: http://www.suntek.com.hk/
    robot-owner-name: Suntek Computer Systems
    robot-owner-url: http://www.suntek.com.hk/
    robot-owner-email: karen@suntek.com.hk
    robot-status: operational
    robot-purpose: to create a search portal on Asian web sites
    robot-type:
    robot-platform: NT, Linux, UNIX
    robot-availability: available now
    robot-exclusion:
    robot-exclusion-useragent:
    robot-noindex: yes
    robot-host: search.suntek.com.hk
    robot-from: yes
    robot-useragent: suntek/1.0
    robot-language: Java
    robot-description: A multilingual search engine with emphasis on Asia contents
    robot-history:
    robot-environment:
    modified-date:
    modified-by:

  • robot-id: sven
    robot-name: Sven
    robot-cover-url:
    robot-details-url: http://marty.weathercity.com/sven/
    robot-owner-name: Marty Anstey
    robot-owner-url: http://marty.weathercity.com/
    robot-owner-email: rhondle@home.com
    robot-status: Active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: Windows
    robot-availability: none
    robot-exclusion: no
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host: 24.113.12.29
    robot-from: no
    robot-useragent:
    robot-language: VB5
    robot-description: Used to gather sites for netbreach.com. Runs constantly.
    robot-history: Developed as an experiment in web indexing.
    robot-environment: hobby, service
    modified-date: Tue, 3 Mar 1999 08:15:00 PST
    modified-by: Marty Anstey

  • robot-id: tach_bw
    robot-name: TACH Black Widow
    robot-cover-url: http://theautochannel.com/~mjenn/bw.html
    robot-details-url: http://theautochannel.com/~mjenn/bw-syntax.html
    robot-owner-name: Michael Jennings
    robot-owner-url: http://www.spd.louisville.edu/~mejenn01/
    robot-owner-email: mjenn@theautochannel.com
    robot-status: development
    robot-purpose: maintenance: link validation
    robot-type: standalone
    robot-platform: UNIX, Linux
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: tach_bw
    robot-noindex: no
    robot-host: *.theautochannel.com
    robot-from: yes
    robot-useragent: Mozilla/3.0 (Black Widow v1.1.0; Linux 2.0.27; Dec 31 1997 12:25:00
    robot-language: C/C++
    robot-description: Exhaustively recurses a single site to check for broken links
    robot-history: Corporate application begun in 1996 for The Auto Channel
    robot-environment: commercial
    modified-date: Thu, Jan 23 1997 23:09:00 GMT
    modified-by: Michael Jennings

  • robot-id:tarantula
    robot-name: Tarantula
    robot-cover-url: http://www.nathan.de/nathan/software.html#TARANTULA
    robot-details-url: http://www.nathan.de/
    robot-owner-name: Markus Hoevener
    robot-owner-url:
    robot-owner-email: Markus.Hoevener@evision.de
    robot-status: development
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: yes
    robot-noindex: yes
    robot-host: yes
    robot-from: no
    robot-useragent: Tarantula/1.0
    robot-language: C
    robot-description: Tarantual gathers information for german search engine Nathanrobot-history: Started February 1997
    robot-environment: service
    modified-date: Mon, 29 Dec 1997 15:30:00 GMT
    modified-by: Markus Hoevener

  • robot-id: tarspider
    robot-name: tarspider
    robot-cover-url:
    robot-details-url:
    robot-owner-name: Olaf Schreck
    robot-owner-url: http://www.chemie.fu-berlin.de/user/chakl/ChaklHome.html
    robot-owner-email: chakl@fu-berlin.de
    robot-status:
    robot-purpose: mirroring
    robot-type:
    robot-platform:
    robot-availability:
    robot-exclusion:
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host:
    robot-from: chakl@fu-berlin.de
    robot-useragent: tarspider
    robot-language:
    robot-description:
    robot-history:
    robot-environment:
    modified-date:
    modified-by:

  • robot-id: tcl
    robot-name: Tcl W3 Robot
    robot-cover-url: http://hplyot.obspm.fr/~dl/robo.html
    robot-details-url:
    robot-owner-name: Laurent Demailly
    robot-owner-url: http://hplyot.obspm.fr/~dl/
    robot-owner-email: dl@hplyot.obspm.fr
    robot-status:
    robot-purpose: maintenance, statistics
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host: hplyot.obspm.fr
    robot-from: yes
    robot-useragent: dlw3robot/x.y (in TclX by http://hplyot.obspm.fr/~dl/)
    robot-language: tcl
    robot-description: Its purpose is to validate links, and generate
    statistics.
    robot-history:
    robot-environment:
    modified-date: Tue May 23 17:51:39 1995
    modified-by:

  • robot-id: techbot
    robot-name: TechBOT
    robot-cover-url: http://www.techaid.net/
    robot-details-url: http://www.echaid.net/TechBOT/
    robot-owner-name: TechAID Internet Services
    robot-owner-url: http://www.techaid.net/
    robot-owner-email: techbot@techaid.net
    robot-status: active
    robot-purpose:statistics, maintenance
    robot-type: standalone
    robot-platform: Unix
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: TechBOT
    robot-noindex: yes
    robot-host: techaid.net
    robot-from: yes
    robot-useragent: TechBOT
    robot-language: perl5
    robot-description: TechBOT is constantly upgraded. Currently he is used for
    Link Validation, Load Time, HTML Validation and much much more.
    robot-history: TechBOT started his life as a Page Change Detection robot,
    but has taken on many new and exciting roles.
    robot-environment: service
    modified-date: Sat, 18 Dec 1998 14:26:00 EST
    modified-by: techbot@techaid.net

  • robot-id: templeton
    robot-name: Templeton
    robot-cover-url: http://www.bmtmicro.com/catalog/tton/
    robot-details-url: http://www.bmtmicro.com/catalog/tton/
    robot-owner-name: Neal Krawetz
    robot-owner-url: http://www.cs.tamu.edu/people/nealk/
    robot-owner-email: nealk@net66.com
    robot-status: active
    robot-purpose: mirroring, mapping, automating web applications
    robot-type: standalone
    robot-platform: OS/2, Linux, SunOS, Solaris
    robot-availability: binary
    robot-exclusion: yes
    robot-exclusion-useragent: templeton
    robot-noindex: no
    robot-host: *
    robot-from: yes
    robot-useragent: Templeton/{version} for {platform}
    robot-language: C
    robot-description: Templeton is a very configurable robots for mirroring, mapping, and automating applications on retrieved documents.
    robot-history: This robot was originally created as a test-of-concept.
    robot-environment: service, commercial, research, hobby
    modified-date: Sun, 6 Apr 1997 10:00:00 GMT
    modified-by: Neal Krawetz

  • robot-id: titin
    robot-name: TitIn
    robot-cover-url: http://www.foi.hr/~dpavlin/titin/
    robot-details-url: http://www.foi.hr/~dpavlin/titin/tehnical.htm
    robot-owner-name: Dobrica Pavlinusic
    robot-owner-url: http://www.foi.hr/~dpavlin/
    robot-owner-email: dpavlin@foi.hr
    robot-status: development
    robot-purpose: indexing, statistics
    robot-type: standalone
    robot-platform: unix
    robot-availability: data, source on request
    robot-exclusion: yes
    robot-exclusion-useragent: titin
    robot-noindex: no
    robot-host: barok.foi.hr
    robot-from: no
    robot-useragent: TitIn/0.2
    robot-language: perl5, c
    robot-description:
    The TitIn is used to index all titles of Web server in
    .hr domain.
    robot-history:
    It was done as result of desperate need for central index of
    Croatian web servers in December 1996.
    robot-environment: research
    modified-date: Thu, 12 Dec 1996 16:06:42 MET
    modified-by: Dobrica Pavlinusic

  • robot-id: titan
    robot-name: TITAN
    robot-cover-url: http://isserv.tas.ntt.jp/chisho/titan-e.html
    robot-details-url: http://isserv.tas.ntt.jp/chisho/titan-help/eng/titan-help-e.html
    robot-owner-name: Yoshihiko HAYASHI
    robot-owner-url:
    robot-owner-email: hayashi@nttnly.isl.ntt.jp
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: SunOS 4.1.4
    robot-availability: no
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host: nlptitan.isl.ntt.jp
    robot-from: yes
    robot-useragent: TITAN/0.1
    robot-language: perl 4
    robot-description: Its purpose is to generate a Resource Discovery
    database, and copy document trees. Our primary goal is to develop
    an advanced method for indexing the WWW documents. Uses libwww-perl
    robot-history:
    robot-environment:
    modified-date: Mon Jun 24 17:20:44 PDT 1996
    modified-by: Yoshihiko HAYASHI

  • robot-id: tkwww
    robot-name: The TkWWW Robot
    robot-cover-url: http://fang.cs.sunyit.edu/Robots/tkwww.html
    robot-details-url:
    robot-owner-name: Scott Spetka
    robot-owner-url: http://fang.cs.sunyit.edu/scott/scott.html
    robot-owner-email: scott@cs.sunyit.edu
    robot-status:
    robot-purpose: indexing
    robot-type:
    robot-platform:
    robot-availability:
    robot-exclusion:
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host:
    robot-from:
    robot-useragent:
    robot-language:
    robot-description: It is designed to search Web neighborhoods to find pages
    that may be logically related. The Robot returns a list of
    links that looks like a hot list. The search can be by key
    word or all links at a distance of one or two hops may be
    returned. The TkWWW Robot is described in a paper presented
    at the WWW94 Conference in Chicago.
    robot-history:
    robot-environment:
    modified-date:
    modified-by:

  • robot-id: tlspider
    robot-name:TLSpider
    robot-cover-url: n/a
    robot-details-url: n/a
    robot-owner-name: topiclink.com
    robot-owner-url: topiclink.com
    robot-owner-email: tlspider@outtel.com
    robot-status: not activated
    robot-purpose: to get web sites and add them to the topiclink future directory
    robot-type:development: robot under development
    robot-platform:linux
    robot-availability:none
    robot-exclusion:yes
    robot-exclusion-useragent:topiclink
    robot-noindex:no
    robot-host: tlspider.topiclink.com (not avalible yet)
    robot-from:no
    robot-useragent:TLSpider/1.1
    robot-language:perl5
    robot-description:This robot runs 2 days a week getting information for
    TopicLink.com
    robot-history:This robot was created to server for the internet search engine
    TopicLink.com
    robot-environment:service
    modified-date:September,10,1999 17:28 GMT
    modified-by: TopicLink Spider Team

  • robot-id: ucsd
    robot-name: UCSD Crawl
    robot-cover-url: http://www.mib.org/~ucsdcrawl
    robot-details-url:
    robot-owner-name: Adam Tilghman
    robot-owner-url: http://www.mib.org/~atilghma
    robot-owner-email: atilghma@mib.org
    robot-status:
    robot-purpose: indexing, statistics
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex:
    robot-host: nuthaus.mib.org scilib.ucsd.edu
    robot-from: yes
    robot-useragent: UCSD-Crawler
    robot-language: Perl 4
    robot-description: Should hit ONLY within UC San Diego - trying to count
    servers here.
    robot-history:
    robot-environment:
    modified-date: Sat Jan 27 09:21:40 1996.
    modified-by:

  • robot-id: udmsearch
    robot-name: UdmSearch
    robot-details-url: http://mysearch.udm.net/
    robot-cover-url: http://mysearch.udm.net/
    robot-owner-name: Alexander Barkov
    robot-owner-url: http://mysearch.udm.net/
    robot-owner-email: bar@izhcom.ru
    robot-status: active
    robot-purpose: indexing, validation
    robot-type: standalone
    robot-platform: unix
    robot-availability: source, binary
    robot-exclusion: yes
    robot-exclusion-useragent: UdmSearch
    robot-noindex: yes
    robot-host: *
    robot-from: no
    robot-useragent: UdmSearch/2.1.1
    robot-language: c
    robot-description: UdmSearch is a free web search engine software for
    intranet/small domain internet servers
    robot-history: Developed since 1998, origin purpose is a search engine
    over republic of Udmurtia http://search.udm.net
    robot-environment: hobby
    modified-date: Mon, 6 Sep 1999 10:28:52 GMT

  • robot-id: urlck
    robot-name: URL Check
    robot-cover-url: http://www.cutternet.com/products/webcheck.html
    robot-details-url: http://www.cutternet.com/products/urlck.html
    robot-owner-name: Dave Finnegan
    robot-owner-url: http://www.cutternet.com
    robot-owner-email: dave@cutternet.com
    robot-status: active
    robot-purpose: maintenance
    robot-type: standalone
    robot-platform: unix
    robot-availability: binary
    robot-exclusion: yes
    robot-exclusion-useragent: urlck
    robot-noindex: no
    robot-host: *
    robot-from: yes
    robot-useragent: urlck/1.2.3
    robot-language: c
    robot-description: The robot is used to manage, maintain, and modify
    web sites. It builds a database detailing the
    site, builds HTML reports describing the site, and
    can be used to up-load pages to the site or to
    modify existing pages and URLs within the site. It
    can also be used to mirror whole or partial sites.
    It supports HTTP, File, FTP, and Mailto schemes.
    robot-history: Originally designed to validate URLs.
    robot-environment: commercial
    modified-date: July 9, 1997
    modified-by: Dave Finnegan

  • robot-id: us
    robot-name: URL Spider Pro
    robot-cover-url: http://www.innerprise.net
    robot-details-url: http://www.innerprise.net/us.htm
    robot-owner-name: Innerprise
    robot-owner-url: http://www.innerprise.net
    robot-owner-email: greg@innerprise.net
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: Windows9x/NT
    robot-availability: binary
    robot-exclusion: yes
    robot-exclusion-useragent: *
    robot-noindex: yes
    robot-host: *
    robot-from: no
    robot-useragent: URL Spider Pro
    robot-language: delphi
    robot-description: Used for building a database of web pages.
    robot-history: Project started July 1998.
    robot-environment: commercial
    modified-date: Mon, 12 Jul 1999 17:50:30 GMT
    modified-by: Innerprise

  • robot-id: valkyrie
    robot-name: Valkyrie
    robot-cover-url: http://kichijiro.c.u-tokyo.ac.jp/odin/
    robot-details-url: http://kichijiro.c.u-tokyo.ac.jp/odin/robot.html
    robot-owner-name: Masanori Harada
    robot-owner-url: http://www.graco.c.u-tokyo.ac.jp/~harada/
    robot-owner-email: harada@graco.c.u-tokyo.ac.jp
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: Valkyrie libwww-perl
    robot-noindex: no
    robot-host: *.c.u-tokyo.ac.jp
    robot-from: yes
    robot-useragent: Valkyrie/1.0 libwww-perl/0.40
    robot-language: perl4
    robot-description: used to collect resources from Japanese Web sites for ODIN search engine.
    robot-history: This robot has been used since Oct. 1995 for author's research.
    robot-environment: service research
    modified-date: Thu Mar 20 19:09:56 JST 1997
    modified-by: harada@graco.c.u-tokyo.ac.jp

  • robot-id: verticrawl
    robot-name: Verticrawl
    robot-cover-url: http://www.verticrawl.com/
    robot-details-url: http://www.verticrawl.com/
    robot-owner-name: Velic, Epromat, Malinge, Troutot, Lhuisset
    robot-owner-url: http://www.verticrawl.com/
    robot-owner-email: webmaster@velic.com, webmaster@epromat.com
    robot-status: active
    robot-purpose: indexing, maintenance, statistics, and classifying urls in a global ASP solution
    robot-type: standalone
    robot-platform: Unix, Linux and windowsNT
    robot-availability: none
    robot-exclusion: verticrawl
    robot-exclusion-useragent: verticrawl
    robot-noindex: yes
    robot-host: http://193.251.26.45:15555/
    robot-from: Yes
    robot-useragent: Verticrawl
    robot-language: c, perl
    robot-description: Verticrawl is a global search engine dedicated to application service providing in specialized directory projet
    robot-history: Verticrawl is based on web solutions for knowledge management and Web portals back office services
    robot-environment: commercial
    modified-date: mon, 10 Dec 2001 17:28:52 GMT
    modified-by: webmaster@velic.com

  • robot-id: victoria
    robot-name: Victoria
    robot-cover-url:
    robot-details-url:
    robot-owner-name: Adrian Howard
    robot-owner-url:
    robot-owner-email: adrianh@oneworld.co.uk
    robot-status: development
    robot-purpose: maintenance
    robot-type: standalone
    robot-platform: unix
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: Victoria
    robot-noindex: yes
    robot-host:
    robot-from:
    robot-useragent: Victoria/1.0
    robot-language: perl,c
    robot-description: Victoria is part of a groupware produced
    by Victoria Real Ltd. (voice: +44 [0]1273 774469,
    fax: +44 [0]1273 779960 email: victoria@pavilion.co.uk).
    Victoria is used to monitor changes in W3 documents,
    both intranet and internet based.
    Contact Victoria Real for more information.
    robot-history:
    robot-environment: commercial
    modified-date: Fri, 22 Nov 1996 16:45 GMT
    modified-by: victoria@pavilion.co.uk

  • robot-id: visionsearch
    robot-name: vision-search
    robot-cover-url: http://www.ius.cs.cmu.edu/cgi-bin/vision-search
    robot-details-url:
    robot-owner-name: Henry A. Rowley
    robot-owner-url: http://www.cs.cmu.edu/~har
    robot-owner-email: har@cs.cmu.edu
    robot-status:
    robot-purpose: indexing.
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex:
    robot-host: dylan.ius.cs.cmu.edu
    robot-from: no
    robot-useragent: vision-search/3.0'
    robot-language: Perl 5
    robot-description: Intended to be an index of computer vision pages, containing
    all pages within <em>n</em> links (for some small
    <em>n</em>) of the Vision Home Page
    robot-history:
    robot-environment:
    modified-date: Fri Mar 8 16:03:04 1996
    modified-by:

  • robot-id: voyager
    robot-name: Voyager
    robot-cover-url: http://www.lisa.co.jp/voyager/
    robot-details-url:
    robot-owner-name: Voyager Staff
    robot-owner-url: http://www.lisa.co.jp/voyager/
    robot-owner-email: voyager@lisa.co.jp
    robot-status: development
    robot-purpose: indexing, maintenance
    robot-type: standalone
    robot-platform: unix
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: Voyager
    robot-noindex: no
    robot-host: *.lisa.co.jp
    robot-from: yes
    robot-useragent: Voyager/0.0
    robot-language: perl5
    robot-description: This robot is used to build the database for the
    Lisa Search service. The robot manually launch
    and visits sites in a random order.
    robot-history:
    robot-environment: service
    modified-date: Mon, 30 Nov 1998 08:00:00 GMT
    modified-by: Hideyuki Ezaki

  • robot-id: vwbot
    robot-name: VWbot
    robot-cover-url: http://vancouver-webpages.com/VWbot/
    robot-details-url: http://vancouver-webpages.com/VWbot/aboutK.shtml
    robot-owner-name: Andrew Daviel
    robot-owner-url: http://vancouver-webpages.com/~admin/
    robot-owner-email: andrew@vancouver-webpages.com
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix
    robot-availability: source
    robot-exclusion: yes
    robot-exclusion-useragent: VWbot_K
    robot-noindex: yes
    robot-host: vancouver-webpages.com
    robot-from: yes
    robot-useragent: VWbot_K/4.2
    robot-language: perl4
    robot-description: Used to index BC sites for the searchBC database. Runs daily.
    robot-history: Originally written fall 1995. Actively maintained.
    robot-environment: service commercial research
    modified-date: Tue, 4 Mar 1997 20:00:00 GMT
    modified-by: Andrew Daviel

  • robot-id: w3index
    robot-name: The NWI Robot
    robot-cover-url: http://www.ub2.lu.se/NNC/projects/NWI/the_nwi_robot.html
    robot-owner-name: Sigfrid Lundberg, Lund university, Sweden
    robot-owner-url: http://nwi.ub2.lu.se/~siglun
    robot-owner-email: siglun@munin.ub2.lu.se
    robot-status: active
    robot-purpose: discovery,statistics
    robot-type: standalone
    robot-platform: UNIX
    robot-availability: none (at the moment)
    robot-exclusion: yes
    robot-noindex: No
    robot-host: nwi.ub2.lu.se, mars.dtv.dk and a few others
    robot-from: yes
    robot-useragent: w3index
    robot-language: perl5
    robot-description: A resource discovery robot, used primarily for
    the indexing of the Scandinavian Web
    robot-history: It is about a year or so old.
    Written by Anders Ardľ, Mattias Borrell,
    H┬kan Ardľ and myself.
    robot-environment: service,research
    modified-date: Wed Jun 26 13:58:04 MET DST 1996
    modified-by: Sigfrid Lundberg

  • robot-id: w3m2
    robot-name: W3M2
    robot-cover-url: http://tronche.com/W3M2
    robot-details-url:
    robot-owner-name: Christophe Tronche
    robot-owner-url: http://tronche.com/
    robot-owner-email: tronche@lri.fr
    robot-status:
    robot-purpose: indexing, maintenance, statistics
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host: *
    robot-from: yes
    robot-useragent: W3M2/x.xxx
    robot-language: Perl 4, Perl 5, and C++
    robot-description: to generate a Resource Discovery database, validate links,
    validate HTML, and generate statistics
    robot-history:
    robot-environment:
    modified-date: Fri May 5 17:48:48 1995
    modified-by:

  • robot-id: wallpaper
    robot-name: WallPaper (alias crawlpaper)
    robot-cover-url: http://www.crawlpaper.com/
    robot-details-url: http://sourceforge.net/projects/crawlpaper/
    robot-owner-name: Luca Piergentili
    robot-owner-url: http://www.geocities.com/lpiergentili/
    robot-owner-email: lpiergentili@yahoo.com
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: windows
    robot-availability: source, binary
    robot-exclusion: yes
    robot-exclusion-useragent: crawlpaper
    robot-noindex: no
    robot-host:
    robot-from:
    robot-useragent: CrawlPaper/n.n.n (Windows n)
    robot-language: C++
    robot-description: a crawler for pictures download and offline browsing
    robot-history: started as screensaver the program has evolved to a crawler
    including an audio player, etc.
    robot-environment: hobby
    modified-date: Mon, 25 Aug 2003 09:00:00 GMT
    modified-by:

  • robot-id: wanderer
    robot-name: the World Wide Web Wanderer
    robot-cover-url: http://www.mit.edu/people/mkgray/net/
    robot-details-url:
    robot-owner-name: Matthew Gray
    robot-owner-url: http://www.mit.edu:8001/people/mkgray/mkgray.html
    robot-owner-email: mkgray@mit.edu
    robot-status: active
    robot-purpose: statistics
    robot-type: standalone
    robot-platform: unix
    robot-availability: data
    robot-exclusion: no
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host: *.mit.edu
    robot-from:
    robot-useragent: WWWWanderer v3.0
    robot-language: perl4
    robot-description: Run initially in June 1993, its aim is to measure
    the growth in the web.
    robot-history:
    robot-environment: research
    modified-date:
    modified-by:

  • robot-id: wapspider
    robot-name: w@pSpider by wap4.com
    robot-cover-url: http://mopilot.com/
    robot-details-url: http://wap4.com/portfolio.htm
    robot-owner-name: Dieter Kneffel
    robot-owner-url: http://wap4.com/ (corporate)
    robot-owner-email: info@wap4.com
    robot-status: active
    robot-purpose: indexing, maintenance (special: dedicated to wap/wml pages)
    robot-type: standalone
    robot-platform: unix
    robot-availability: data
    robot-exclusion: yes
    robot-exclusion-useragent: wapspider
    robot-noindex: [does not apply for wap/wml pages!]
    robot-host: *.wap4.com, *.mopilot.com
    robot-from: yes
    robot-useragent: w@pSpider/xxx (unix) by wap4.com
    robot-language: c, php, sql
    robot-description: wapspider is used to build the database for
    mopilot.com, a search engine for mobile contents; it is specially
    designed to crawl wml-pages. html is indexed, but html-links are
    (currently) not followed
    robot-history: this robot was developed by wap4.com in 1999 for the
    world's first wap-search engine
    robot-environment: service, commercial, research
    modified-date: Fri, 23 Jun 2000 14:33:52 MESZ
    modified-by: Dieter Kneffel, data@wap4.com

  • robot-id:webbandit
    robot-name:WebBandit Web Spider
    robot-cover-url:http://pw2.netcom.com/~wooger/
    robot-details-url:http://pw2.netcom.com/~wooger/
    robot-owner-name:Jerry Walsh
    robot-owner-url:http://pw2.netcom.com/~wooger/
    robot-owner-email:wooger@ix.netcom.com
    robot-status:active
    robot-purpose:Resource Gathering / Server Benchmarking
    robot-type:standalone application
    robot-platform:Intel - windows95
    robot-availability:source, binary
    robot-exclusion:no
    robot-exclusion-useragent:WebBandit/1.0
    robot-noindex:no
    robot-host:ix.netcom.com
    robot-from:no
    robot-useragent:WebBandit/1.0
    robot-language:C++
    robot-description:multithreaded, hyperlink-following,
    resource finding webspider
    robot-history:Inspired by reading of
    Internet Programming book by Jamsa/Cope
    robot-environment:commercial
    modified-date:11/21/96
    modified-by:Jerry Walsh

  • robot-id: webcatcher
    robot-name: WebCatcher
    robot-cover-url: http://oscar.lang.nagoya-u.ac.jp
    robot-details-url:
    robot-owner-name: Reiji SUZUKI
    robot-owner-url: http://oscar.lang.nagoya-u.ac.jp/~reiji/index.html
    robot-owner-email: reiji@infonia.ne.jp
    robot-owner-name2: Masatoshi SUGIURA
    robot-owner-url2: http://oscar.lang.nagoya-u.ac.jp/~sugiura/index.html
    robot-owner-email2: sugiura@lang.nagoya-u.ac.jp
    robot-status: development
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix, windows, mac
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: webcatcher
    robot-noindex: no
    robot-host: oscar.lang.nagoya-u.ac.jp
    robot-from: no
    robot-useragent: WebCatcher/1.0
    robot-language: perl5
    robot-description: WebCatcher gathers web pages
    that Japanese collage students want to visit.
    robot-history: This robot finds its roots in a research project
    at Nagoya University in 1998.
    robot-environment: research
    modified-date: Fri, 16 Oct 1998 17:28:52 JST
    modified-by: "Reiji SUZUKI" <reiji@infonia.ne.jp>

  • robot-id: webcopy
    robot-name: WebCopy
    robot-cover-url: http://www.inf.utfsm.cl/~vparada/webcopy.html
    robot-details-url:
    robot-owner-name: Victor Parada
    robot-owner-url: http://www.inf.utfsm.cl/~vparada/
    robot-owner-email: vparada@inf.utfsm.cl
    robot-status:
    robot-purpose: mirroring
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: no
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host: *
    robot-from: no
    robot-useragent: WebCopy/(version)
    robot-language: perl 4 or perl 5
    robot-description: Its purpose is to perform mirroring. WebCopy can retrieve
    files recursively using HTTP protocol.It can be used as a
    delayed browser or as a mirroring tool. It cannot jump from
    one site to another.
    robot-history:
    robot-environment:
    modified-date: Sun Jul 2 15:27:04 1995
    modified-by:

  • robot-id: webfetcher
    robot-name: webfetcher
    robot-cover-url: http://www.ontv.com/
    robot-details-url:
    robot-owner-name:
    robot-owner-url: http://www.ontv.com/
    robot-owner-email: webfetch@ontv.com
    robot-status:
    robot-purpose: mirroring
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: no
    robot-exclusion-useragent:
    robot-noindex:
    robot-host: *
    robot-from: yes
    robot-useragent: WebFetcher/0.8,
    robot-language: C++
    robot-description: don't wait! OnTV's WebFetcher mirrors whole sites down to
    your hard disk on a TV-like schedule. Catch w3
    documentation. Catch discovery.com without waiting! A fully
    operational web robot for NT/95 today, most UNIX soon, MAC
    tomorrow.
    robot-history:
    robot-environment:
    modified-date: Sat Jan 27 10:31:43 1996.
    modified-by:

  • robot-id: webfoot
    robot-name: The Webfoot Robot
    robot-cover-url:
    robot-details-url:
    robot-owner-name: Lee McLoughlin
    robot-owner-url: http://web.doc.ic.ac.uk/f?/lmjm
    robot-owner-email: L.McLoughlin@doc.ic.ac.uk
    robot-status:
    robot-purpose:
    robot-type:
    robot-platform:
    robot-availability:
    robot-exclusion:
    robot-exclusion-useragent:
    robot-noindex:
    robot-host: phoenix.doc.ic.ac.uk
    robot-from:
    robot-useragent:
    robot-language:
    robot-description:
    robot-history: First spotted in Mid February 1994
    robot-environment:
    modified-date:
    modified-by:

  • robot-id: webinator
    robot-name: Webinator
    robot-details-url: http://www.thunderstone.com/texis/site/pages/webinator4_admin.html
    robot-cover-url: http://www.thunderstone.com/texis/site/pages/webinator.html
    robot-owner-name:
    robot-owner-email:
    robot-status: active, under further enhancement.
    robot-purpose: information retrieval
    robot-type: standalone
    robot-exclusion: yes
    robot-noindex: yes
    robot-exclusion-useragent: T-H-U-N-D-E-R-S-T-O-N-E
    robot-host: several
    robot-from: No
    robot-language: Texis Vortex
    robot-history:
    robot-environment: Commercial

  • robot-id: weblayers
    robot-name: weblayers
    robot-cover-url: http://www.univ-paris8.fr/~loic/weblayers/
    robot-details-url:
    robot-owner-name: Loic Dachary
    robot-owner-url: http://www.univ-paris8.fr/~loic/
    robot-owner-email: loic@afp.com
    robot-status:
    robot-purpose: maintainance
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host:
    robot-from:
    robot-useragent: weblayers/0.0
    robot-language: perl 5
    robot-description: Its purpose is to validate, cache and maintain links. It is
    designed to maintain the cache generated by the emacs emacs
    w3 mode (N*tscape replacement) and to support annotated
    documents (keep them in sync with the original document via
    diff/patch).
    robot-history:
    robot-environment:
    modified-date: Fri Jun 23 16:30:42 FRE 1995
    modified-by:

  • robot-id: weblinker
    robot-name: WebLinker
    robot-cover-url: http://www.cern.ch/WebLinker/
    robot-details-url:
    robot-owner-name: James Casey
    robot-owner-url: http://www.maths.tcd.ie/hyplan/jcasey/jcasey.html
    robot-owner-email: jcasey@maths.tcd.ie
    robot-status:
    robot-purpose: maintenance
    robot-type:
    robot-platform:
    robot-availability:
    robot-exclusion:
    robot-exclusion-useragent:
    robot-noindex:
    robot-host:
    robot-from:
    robot-useragent: WebLinker/0.0 libwww-perl/0.1
    robot-language:
    robot-description: it traverses a section of web, doing URN->URL conversion.
    It will be used as a post-processing tool on documents created
    by automatic converters such as LaTeX2HTML or WebMaker. At
    the moment it works at full speed, but is restricted to
    localsites. External GETs will be added, but these will be
    running slowly. WebLinker is meant to be run locally, so if
    you see it elsewhere let the author know!
    robot-history:
    robot-environment:
    modified-date:
    modified-by:

  • robot-id: webmirror
    robot-name: WebMirror
    robot-cover-url: http://www.winsite.com/pc/win95/netutil/wbmiror1.zip
    robot-details-url:
    robot-owner-name: Sui Fung Chan
    robot-owner-url: http://www.geocities.com/NapaVally/1208
    robot-owner-email: sfchan@mailhost.net
    robot-status:
    robot-purpose: mirroring
    robot-type: standalone
    robot-platform: Windows95
    robot-availability:
    robot-exclusion: no
    robot-exclusion-useragent:
    robot-noindex:
    robot-host:
    robot-from: no
    robot-useragent: no
    robot-language: C++
    robot-description: It download web pages to hard drive for off-line
    browsing.
    robot-history:
    robot-environment:
    modified-date: Mon Apr 29 08:52:25 1996.
    modified-by:

  • robot-id: webmoose
    robot-name: The Web Moose
    robot-cover-url:
    robot-details-url: http://www.nwlink.com/~mikeblas/webmoose/
    robot-owner-name: Mike Blaszczak
    robot-owner-url: http://www.nwlink.com/~mikeblas/
    robot-owner-email: mikeblas@nwlink.com
    robot-status: development
    robot-purpose: statistics, maintenance
    robot-type: standalone
    robot-platform: Windows NT
    robot-availability: data
    robot-exclusion: no
    robot-exclusion-useragent: WebMoose
    robot-noindex: no
    robot-host: msn.com
    robot-from: no
    robot-useragent: WebMoose/0.0.0000
    robot-language: C++
    robot-description: This robot collects statistics and verifies links.
    It
    builds an graph of its visit path.
    robot-history: This robot is under development.
    It will support ROBOTS.TXT soon.
    robot-environment: hobby
    modified-date: Fri, 30 Aug 1996 00:00:00 GMT
    modified-by: Mike Blaszczak

  • robot-id:webquest
    robot-name:WebQuest
    robot-cover-url:
    robot-details-url:
    robot-owner-name:TaeYoung Choi
    robot-owner-url:http://www.cosmocyber.co.kr:8080/~cty/index.html
    robot-owner-email:cty@cosmonet.co.kr
    robot-status:development
    robot-purpose:indexing
    robot-type:standalone
    robot-platform:unix
    robot-availability:none
    robot-exclusion:yes
    robot-exclusion-useragent:webquest
    robot-noindex:no
    robot-host:210.121.146.2, 210.113.104.1, 210.113.104.2
    robot-from:yes
    robot-useragent:WebQuest/1.0
    robot-language:perl5
    robot-description:WebQuest will be used to build the databases for various web
    search service sites which will be in service by early 1998. Until the end of
    Jan. 1998, WebQuest will run from time to time. Since then, it will run
    daily(for few hours and very slowly).
    robot-history:The developent of WebQuest was motivated by the need for a
    customized robot in various projects of COSMO Information & Communication Co.,
    Ltd. in Korea.
    robot-environment:service
    modified-date:Tue, 30 Dec 1997 09:27:20 GMT
    modified-by:TaeYoung Choi

  • robot-id: webreader
    robot-name: Digimarc MarcSpider
    robot-cover-url: http://www.digimarc.com/prod_fam.html
    robot-details-url: http://www.digimarc.com/prod_fam.html
    robot-owner-name: Digimarc Corporation
    robot-owner-url: http://www.digimarc.com
    robot-owner-email: wmreader@digimarc.com
    robot-status: active
    robot-purpose: maintenance
    robot-type: standalone
    robot-platform: windowsNT
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex:
    robot-host: 206.102.3.*
    robot-from: yes
    robot-useragent: Digimarc WebReader/1.2
    robot-language: c++
    robot-description: Examines image files for watermarks.
    In order to not waste internet bandwidth with yet
    another crawler, we have contracted with one of the major crawlers/seach
    engines to provide us with a list of specific URLs of interest to us. If an
    URL is to an image, we may read the image, but we do not crawl to any other
    URLs. If an URL is to a page of interest (ususally due to CGI), then we
    access the page to get the image URLs from it, but we do not crawl to any
    other pages.
    robot-history: First operation in August 1997.
    robot-environment: service
    modified-date: Mon, 20 Oct 1997 16:44:29 GMT
    modified-by: Brian MacIntosh

  • robot-id: webreaper
    robot-name: WebReaper
    robot-cover-url: http://www.otway.com/webreaper
    robot-details-url:
    robot-owner-name: Mark Otway
    robot-owner-url: http://www.otway.com
    robot-owner-email: webreaper@otway.com
    robot-status: active
    robot-purpose: indexing/offline browsing
    robot-type: standalone
    robot-platform: windows95, windowsNT
    robot-availability: binary
    robot-exclusion: yes
    robot-exclusion-useragent: webreaper
    robot-noindex: no
    robot-host: *
    robot-from: no
    robot-useragent: WebReaper [webreaper@otway.com]
    robot-language: c++
    robot-description: Freeware app which downloads and saves sites locally for
    offline browsing.
    robot-history: Written for personal use, and then distributed to the public
    as freeware.
    robot-environment: hobby
    modified-date: Thu, 25 Mar 1999 15:00:00 GMT
    modified-by: Mark Otway

  • robot-id: webs
    robot-name: webs
    robot-cover-url: http://webdew.rnet.or.jp/
    robot-details-url: http://webdew.rnet.or.jp/service/shank/NAVI/SEARCH/info2.html#robot
    robot-owner-name: Recruit Co.Ltd,
    robot-owner-url:
    robot-owner-email: dew@wwwadmin.rnet.or.jp
    robot-status: active
    robot-purpose: statistics
    robot-type: standalone
    robot-platform: unix
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: webs
    robot-noindex: no
    robot-host: lemon.recruit.co.jp
    robot-from: yes
    robot-useragent: webs@recruit.co.jp
    robot-language: perl5
    robot-description: The webs robot is used to gather WWW servers'
    top pages last modified date data. Collected
    statistics reflects the priority of WWW server
    data collection for webdew indexing service.
    Indexing in webdew is done by manually.
    robot-history:
    robot-environment: service
    modified-date: Fri, 6 Sep 1996 10:00:00 GMT
    modified-by:

  • robot-id: websnarf
    robot-name: Websnarf
    robot-cover-url:
    robot-details-url:
    robot-owner-name: Charlie Stross
    robot-owner-url:
    robot-owner-email: charles@fma.com
    robot-status: retired
    robot-purpose:
    robot-type:
    robot-platform:
    robot-availability:
    robot-exclusion:
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host:
    robot-from:
    robot-useragent:
    robot-language:
    robot-description:
    robot-history:
    robot-environment:
    modified-date:
    modified-by:

  • robot-id: webspider
    robot-name: WebSpider
    robot-details-url: http://www.csi.uottawa.ca/~u610468
    robot-cover-url:
    robot-owner-name: Nicolas Fraiji
    robot-owner-email: u610468@csi.uottawa.ca
    robot-status: active, under further enhancement.
    robot-purpose: maintenance, link diagnostics
    robot-type: standalone
    robot-exclusion: yes
    robot-noindex: no
    robot-exclusion-useragent: webspider
    robot-host: several
    robot-from: Yes
    robot-language: Perl4
    robot-history: developped as a course project at the University of
    Ottawa, Canada in 1996.
    robot-environment: Educational use and Research

  • robot-id: webvac
    robot-name: WebVac
    robot-cover-url: http://www.federated.com/~tim/webvac.html
    robot-details-url:
    robot-owner-name: Tim Jensen
    robot-owner-url: http://www.federated.com/~tim
    robot-owner-email: tim@federated.com
    robot-status:
    robot-purpose: mirroring
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: no
    robot-exclusion-useragent:
    robot-noindex:
    robot-host:
    robot-from: no
    robot-useragent: webvac/1.0
    robot-language: C++
    robot-description:
    robot-history:
    robot-environment:
    modified-date: Mon May 13 03:19:17 1996.
    modified-by:

  • robot-id: webwalk
    robot-name: webwalk
    robot-cover-url:
    robot-details-url:
    robot-owner-name: Rich Testardi
    robot-owner-url:
    robot-owner-email:
    robot-status: retired
    robot-purpose: indexing, maintentance, mirroring, statistics
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: yes
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host:
    robot-from: yes
    robot-useragent: webwalk
    robot-language: c
    robot-description: Its purpose is to generate a Resource Discovery database,
    validate links, validate HTML, perform mirroring, copy
    document trees, and generate statistics. Webwalk is easily
    extensible to perform virtually any maintenance function
    which involves web traversal, in a way much like the '-exec'
    option of the find(1) command. Webwalk is usually used
    behind the HP firewall
    robot-history:
    robot-environment:
    modified-date: Wed Nov 15 09:51:59 PST 1995
    modified-by:

  • robot-id: webwalker
    robot-name: WebWalker
    robot-cover-url:
    robot-details-url:
    robot-owner-name: Fah-Chun Cheong
    robot-owner-url: http://www.cs.berkeley.edu/~fccheong/
    robot-owner-email: fccheong@cs.berkeley.edu
    robot-status: active
    robot-purpose: maintenance
    robot-type: standalone
    robot-platform: unix
    robot-availability: source
    robot-exclusion: yes
    robot-exclusion-useragent: WebWalker
    robot-noindex: no
    robot-host: *
    robot-from: yes
    robot-useragent: WebWalker/1.10
    robot-language: perl4
    robot-description: WebWalker performs WWW traversal for individual
    sites and tests for the integrity of all hyperlinks
    to external sites.
    robot-history: A Web maintenance robot for expository purposes,
    first published in the book "Internet Agents: Spiders,
    Wanderers, Brokers, and Bots" by the robot's author.
    robot-environment: hobby
    modified-date: Thu, 25 Jul 1996 16:00:52 PDT
    modified-by: Fah-Chun Cheong

  • robot-id: webwatch
    robot-name: WebWatch
    robot-cover-url: http://www.specter.com/users/janos/specter
    robot-details-url:
    robot-owner-name: Joseph Janos
    robot-owner-url: http://www.specter.com/users/janos/specter
    robot-owner-email: janos@specter.com
    robot-status:
    robot-purpose: maintainance, statistics
    robot-type: standalone
    robot-platform:
    robot-availability:
    robot-exclusion: no
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host:
    robot-from: no
    robot-useragent: WebWatch
    robot-language: c++
    robot-description: Its purpose is to validate HTML, and generate statistics.
    Check URLs modified since a given date.
    robot-history:
    robot-environment:
    modified-date: Wed Jul 26 13:36:32 1995
    modified-by:

  • robot-id: wget
    robot-name: Wget
    robot-cover-url: ftp://gnjilux.cc.fer.hr/pub/unix/util/wget/
    robot-details-url:
    robot-owner-name: Hrvoje Niksic
    robot-owner-url:
    robot-owner-email: hniksic@srce.hr
    robot-status: development
    robot-purpose: mirroring, maintenance
    robot-type: standalone
    robot-platform: unix
    robot-availability: source
    robot-exclusion: yes
    robot-exclusion-useragent: wget
    robot-noindex: no
    robot-host: *
    robot-from: yes
    robot-useragent: Wget/1.4.0
    robot-language: C
    robot-description:
    Wget is a utility for retrieving files using HTTP and FTP protocols.
    It works non-interactively, and can retrieve HTML pages and FTP
    trees recursively. It can be used for mirroring Web pages and FTP
    sites, or for traversing the Web gathering data. It is run by the
    end user or archive maintainer.
    robot-history:
    robot-environment: hobby, research
    modified-date: Mon, 11 Nov 1996 06:00:44 MET
    modified-by: Hrvoje Niksic

  • robot-id: whatuseek
    robot-name: whatUseek Winona
    robot-cover-url: http://www.whatUseek.com/
    robot-details-url: http://www.whatUseek.com/
    robot-owner-name: Neil Mansilla
    robot-owner-url: http://www.whatUseek.com/
    robot-owner-email: neil@whatUseek.com
    robot-status: active
    robot-purpose: Robot used for site-level search and meta-search engines.
    robot-type: standalone
    robot-platform: unix
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: winona
    robot-noindex: yes
    robot-host: *.whatuseek.com, *.aol2.com
    robot-from: no
    robot-useragent: whatUseek_winona/3.0
    robot-language: c++
    robot-description: The whatUseek robot, Winona, is used for site-level
    search engines. It is also implemented in several meta-search engines.
    robot-history: Winona was developed in November of 1996.
    robot-environment: service
    modified-date: Wed, 17 Jan 2001 11:52:00 EST
    modified-by: Neil Mansilla

  • robot-id: whowhere
    robot-name: WhoWhere Robot
    robot-cover-url: http://www.whowhere.com
    robot-details-url:
    robot-owner-name: Rupesh Kapoor
    robot-owner-url:
    robot-owner-email: rupesh@whowhere.com
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: Sun Unix
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: whowhere
    robot-noindex: no
    robot-host: spica.whowhere.com
    robot-from: no
    robot-useragent:
    robot-language: C/Perl
    robot-description: Gathers data for email directory from web pages
    robot-history:
    robot-environment: commercial
    modified-date:
    modified-by:

  • robot-id: wlm
    robot-name: Weblog Monitor
    robot-details-url: http://www.metastatic.org/wlm/
    robot-cover-url: http://www.metastatic.org/wlm/
    robot-owner-name: Casey Marshall
    robot-owner-url: http://www.metastatic.org/
    robot-owner-email: rsdio@metastatic.org
    robot-status: active
    robot-purpose: statistics
    robot-type: standalone
    robot-platform: unix, windows,
    robot-availability: source, data
    robot-exclusion: no
    robot-exclusion-useragent: wlm
    robot-noindex: no
    robot-nofollow: no
    robot-host: blossom.metastatic.org
    robot-from: no
    robot-useragent: wlm-1.1
    robot-language: java
    robot-description1: Builds the 'Picture of Weblogs' applet.
    robot-description2: See http://www.metastatic.org/wlm/.
    robot-environment: hobby
    modified-date: Fri, 2 Nov 2001 04:55:00 PST

  • robot-id: wmir
    robot-name: w3mir
    robot-cover-url: http://www.ifi.uio.no/~janl/w3mir.html
    robot-details-url:
    robot-owner-name: Nicolai Langfeldt
    robot-owner-url: http://www.ifi.uio.no/~janl/w3mir.html
    robot-owner-email: w3mir-core@usit.uio.no
    robot-status:
    robot-purpose: mirroring.
    robot-type: standalone
    robot-platform: UNIX, WindowsNT
    robot-availability:
    robot-exclusion: no.
    robot-exclusion-useragent:
    robot-noindex:
    robot-host:
    robot-from: yes
    robot-useragent: w3mir
    robot-language: Perl
    robot-description: W3mir uses the If-Modified-Since HTTP header and recurses
    only the directory and subdirectories of it's start
    document. Known to work on U*ixes and Windows
    NT.
    robot-history:
    robot-environment:
    modified-date: Wed Apr 24 13:23:42 1996.
    modified-by:

  • robot-id: wolp
    robot-name: WebStolperer
    robot-cover-url: http://www.suchfibel.de/maschinisten
    robot-details-url: http://www.suchfibel.de/maschinisten/text/werkzeuge.htm (in German)
    robot-owner-name: Marius Dahler
    robot-owner-url: http://www.suchfibel.de/maschinisten
    robot-owner-email: mda@suchfibel.de
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix, NT
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: WOLP
    robot-noindex: yes
    robot-host: www.suchfibel.de
    robot-from: yes
    robot-useragent: WOLP/1.0 mda/1.0
    robot-language: perl5
    robot-description: The robot gathers information about specified
    web-projects and generates knowledge bases in Javascript or an own
    format
    robot-environment: hobby
    modified-date: 22 Jul 1998
    modified-by: Marius Dahler

  • robot-id: wombat
    robot-name: The Web Wombat
    robot-cover-url: http://www.intercom.com.au/wombat/
    robot-details-url:
    robot-owner-name: Internet Communications
    robot-owner-url: http://www.intercom.com.au/
    robot-owner-email: phill@intercom.com.au
    robot-status:
    robot-purpose: indexing, statistics.
    robot-type:
    robot-platform:
    robot-availability:
    robot-exclusion: no.
    robot-exclusion-useragent:
    robot-noindex:
    robot-host: qwerty.intercom.com.au
    robot-from: no
    robot-useragent: no
    robot-language: IBM Rexx/VisualAge C++ under OS/2.
    robot-description: The robot is the basis of the Web Wombat search engine
    (Australian/New Zealand content ONLY).
    robot-history:
    robot-environment:
    modified-date: Thu Feb 29 00:39:49 1996.
    modified-by:

  • robot-id: worm
    robot-name: The World Wide Web Worm
    robot-cover-url: http://www.cs.colorado.edu/home/mcbryan/WWWW.html
    robot-details-url:
    robot-owner-name: Oliver McBryan
    robot-owner-url: http://www.cs.colorado.edu/home/mcbryan/Home.html
    robot-owner-email: mcbryan@piper.cs.colorado.edu
    robot-status:
    robot-purpose: indexing
    robot-type:
    robot-platform:
    robot-availability:
    robot-exclusion:
    robot-exclusion-useragent:
    robot-noindex: no
    robot-host: piper.cs.colorado.edu
    robot-from:
    robot-useragent:
    robot-language:
    robot-description: indexing robot, actually has quite flexible search
    options
    robot-history:
    robot-environment:
    modified-date:
    modified-by:

  • robot-id: wwwc
    robot-name: WWWC Ver 0.2.5
    robot-cover-url: http://www.kinet.or.jp/naka/tomo/wwwc.html
    robot-details-url:
    robot-owner-name: Tomoaki Nakashima.
    robot-owner-url: http://www.kinet.or.jp/naka/tomo/
    robot-owner-email: naka@kinet.or.jp
    robot-status: active
    robot-purpose: maintenance
    robot-type: standalone
    robot-platform: windows, windows95, windowsNT
    robot-availability: binary
    robot-exclusion: yes
    robot-exclusion-useragent: WWWC
    robot-noindex: no
    robot-host:
    robot-from: yes
    robot-useragent: WWWC/0.25 (Win95)
    robot-language: c
    robot-description:
    robot-history: 1997
    robot-environment: hobby
    modified-date: Tuesday, 18 Feb 1997 06:02:47 GMT
    modified-by: Tomoaki Nakashima (naka@kinet.or.jp)

  • robot-id: wz101
    robot-name: WebZinger
    robot-details-url: http://www.imaginon.com/wzindex.html
    robot-cover-url: http://www.imaginon.com
    robot-owner-name: ImaginOn, Inc
    robot-owner-url: http://www.imaginon.com
    robot-owner-email: info@imaginon.com
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: windows95, windowsNT 4, mac, solaris, unix
    robot-availability: binary
    robot-exclusion: no
    robot-exclusion-useragent: none
    robot-noindex: no
    robot-host: http://www.imaginon.com/wzindex.html *
    robot-from: no
    robot-useragent: none
    robot-language: java
    robot-description: commercial Web Bot that accepts plain text queries, uses
    webcrawler, lycos or excite to get URLs, then visits sites. If the user's
    filter parameters are met, downloads one picture and a paragraph of test.
    Playsback slide show format of one text paragraph plus image from each site.
    robot-history: developed by ImaginOn in 1996 and 1997
    robot-environment: commercial
    modified-date: Wed, 11 Sep 1997 02:00:00 GMT
    modified-by: schwartz@imaginon.com

  • robot-id: xget
    robot-name: XGET
    robot-cover-url: http://www2.117.ne.jp/~moremore/x68000/soft/soft.html
    robot-details-url: http://www2.117.ne.jp/~moremore/x68000/soft/soft.html
    robot-owner-name: Hiroyuki Shigenaga
    robot-owner-url: http://www2.117.ne.jp/~moremore/
    robot-owner-email: shige@mh1.117.ne.jp
    robot-status: active
    robot-purpose: mirroring
    robot-type: standalone
    robot-platform: X68000, X68030
    robot-availability: binary
    robot-exclusion: yes
    robot-exclusion-useragent: XGET
    robot-noindex: no
    robot-host: *
    robot-from: yes
    robot-useragent: XGET/0.7
    robot-language: c
    robot-description: Its purpose is to retrieve updated files.It is run by the end userrobot-history: 1997
    robot-environment: hobby
    modified-date: Fri, 07 May 1998 17:00:00 GMT
    modified-by: Hiroyuki Shigenaga

  • robot-id: Nederland.zoek
    robot-name: Nederland.zoek
    robot-cover-url: http://www.nederland.net/
    robot-details-url:
    robot-owner-name: System Operator Nederland.net
    robot-owner-url:
    robot-owner-email: zoek@nederland.net
    robot-status: active
    robot-purpose: indexing
    robot-type: standalone
    robot-platform: unix (Linux)
    robot-availability: none
    robot-exclusion: yes
    robot-exclusion-useragent: Nederland.zoek
    robot-noindex: no
    robot-host: 193.67.110.*
    robot-from: yes
    robot-useragent: Nederland.zoek
    robot-language: c
    robot-description: This robot indexes all .nl sites for the search-engine of Nederland.net
    robot-history: Developed at Computel Standby in Apeldoorn, The Netherlands
    robot-environment: service
    modified-date: Sat, 8 Feb 1997 01:10:00 CET
    modified-by: Sander Steffann <sander@nederland.net>

    First Page

    Web hosting with free domain name

  • WEBMASTERS
    Search Engine Submit Global
    Web Hosting FAQ
    Web Hosting Glossary
    Search engine ranking tips
    Download free scripts
    Keyword Suggestion Tool
    Downloads
    Google Page Ranking
    Search Engine Analysis
    Robots Index
    Web Crawlers
    Affiliates
    WHOIS
    SUPPORT
    24/7 Help Desk
    Cpanel
    Contact
    WE RECOMMEND
       
    Dependable Linux Servers providing cheap web hosting worldwide
    INTRO | HOME | WEB HOSTING | DEDICATED SERVERS | DEDICATED SERVERS STOCK | NETWORK DIAGRAMM |WEB DESIGN | DOMAIN PARKING | FREE FLASH MENU GENERATORS | FREE GRAPHICS NAVBARS | DHTML/CSS CODE GENERATORS | JAVA SCRIPT CSS CODE GENERATORS | FREE SEARCH ENGINE SUBMISSION | WEB HOSTING F.A.Q | WEB HOSTING GLOSSARY | WEEKLY SEARCH ENGINE RANKING TIPS | DOWNLOAD FREE SCRIPTS & PROGRAMMS | SEARCH ENGINE ANALYSIS | SEARCH TERM SUGESSTION TOOL | TECH NEWS FEED | DOWNLOAD FREE HTML TOOLS | GOOGLE PAGE RANK TIPS | ROBOTS INDEX | WEB CRAWLERS | CPANEL DOCUMENTATION | TERMS OF USE | CONTACT | FORUMS
    © 2002 Hostsun™ All wrignts reserved

    Dedicated servers provider in Europe and Greece