Web hosting with free domain names in europeCpanel X unlimited pop3 accounts with linux web servers
dedicated server uptimeGreek Lang | gr domain name registration 
SERVICES
Web Hosting
Cpanel Free Scripts
Dedicated Servers
Servers Stock
Network
Web Design
Domain Parking
Domain Registration
Data Center Tour
FREE WEB TOOLS
Flash Toolbar Generators
Graphic Toolbar Generators
DHTML / CSS Menu Generators
Java Script Menu Generators
MAILING LIST
Sign up to our mailing list
E-mail:
I want to:
SSL
Google Ads

The Anatomy of a Web Search Engine

2.1 PageRank: Bringing Order to the Web

The citation (link) graph of the web is an important resource that has largely gone unused in existing web search engines. We have created maps containing as many as 518 million of these hyperlinks, a significant sample of the total. These maps allow rapid calculation of a web page's "PageRank", an objective measure of its citation importance that corresponds well with people's subjective idea of importance. Because of this correspondence, PageRank is an excellent way to prioritize the results of web keyword searches. For most popular subjects, a simple text matching search that is restricted to web page titles performs admirably when PageRank prioritizes the results (demo available at google.stanford.edu ). For the type of full text searches in the main Google system, PageRank also helps a great deal.

2.1.1 Description of PageRank Calculation

Academic citation literature has been applied to the web, largely by counting citations or backlinks to a given page. This gives some approximation of a page's importance or quality. PageRank extends this idea by not counting links from all pages equally, and by normalizing by the number of links on a page. PageRank is defined as follows: We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one.

PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web. Also, a PageRank for 26 million web pages can be computed in a few hours on a medium size workstation. There are many other details which are beyond the scope of this paper.

2.1.2 Intuitive Justification

PageRank can be thought of as a model of user behavior. We assume there is a "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the "random surfer" will get bored and request another random page. One important variation is to only add the damping factor d to a single page, or a group of pages. This allows for personalization and can make it nearly impossible to deliberately mislead the system in order to get a higher ranking. We have several other extensions to PageRank, again see [ Page 98 ].

Another intuitive justification is that a page can have a high PageRank if there are many pages that point to it, or if there are some pages that point to it and have a high PageRank. Intuitively, pages that are well cited from many places around the web are worth looking at. Also, pages that have perhaps only one citation from something like the Yahoo! homepage are also generally worth looking at. If a page was not high quality, or was a broken link, it is quite likely that Yahoo's homepage would not link to it. PageRank handles both these cases and everything in between by recursively propagating weights through the link structure of the web.

2.2 Anchor Text

The text of links is treated in a special way in our search engine. Most search engines associate the text of a link with the page that the link is on. In addition, we associate it with the page the link points to. This has several advantages. First, anchors often provide more accurate descriptions of web pages than the pages themselves. Second, anchors may exist for documents which cannot be indexed by a text-based search engine, such as images, programs, and databases. This makes it possible to return web pages which have not actually been crawled. Note that pages that have not been crawled can cause problems, since they are never checked for validity before being returned to the user. In this case, the search engine can even return a page that never actually existed, but had hyperlinks pointing to it. However, it is possible to sort the results, so that this particular problem rarely happens.

This idea of propagating anchor text to the page it refers to was implemented in the World Wide Web Worm [ McBryan 94 ] especially because it helps search non-text information, and expands the search coverage with fewer downloaded documents. We use anchor propagation mostly because anchor text can help provide better quality results. Using anchor text efficiently is technically difficult because of the large amounts of data which must be processed. In our current crawl of 24 million pages, we had over 259 million anchors which we indexed.
<Back

WEBMASTERS
Search Engine Submit Global
Web Hosting FAQ
Web Hosting Glossary
Search engine ranking tips
Download free scripts
Keyword Suggestion Tool
Downloads
Google Page Ranking
Search Engine Analysis
Robots Index
Web Crawlers
Affiliates
WHOIS
SUPPORT
24/7 Help Desk
Cpanel
Contact
WE RECOMMEND
   
Dependable Linux Servers providing cheap web hosting worldwide
INTRO | HOME | WEB HOSTING | DEDICATED SERVERS | DEDICATED SERVERS STOCK | NETWORK DIAGRAMM |WEB DESIGN | DOMAIN PARKING | FREE FLASH MENU GENERATORS | FREE GRAPHICS NAVBARS | DHTML/CSS CODE GENERATORS | JAVA SCRIPT CSS CODE GENERATORS | FREE SEARCH ENGINE SUBMISSION | WEB HOSTING F.A.Q | WEB HOSTING GLOSSARY | WEEKLY SEARCH ENGINE RANKING TIPS | DOWNLOAD FREE SCRIPTS & PROGRAMMS | SEARCH ENGINE ANALYSIS | SEARCH TERM SUGESSTION TOOL | TECH NEWS FEED | DOWNLOAD FREE HTML TOOLS | GOOGLE PAGE RANK TIPS | ROBOTS INDEX | WEB CRAWLERS | CPANEL DOCUMENTATION | TERMS OF USE | CONTACT | FORUMS
© 2002 Hostsun™ All wrignts reserved

Dedicated servers provider in Europe and Greece