pages are shown: two scrambled egg recipes and two pages that link to the recipes. One of the links is from the author of this book (who is not a famous chef) and one is from the home page of the famous chef Alice Waters. The authority trick ranks Bert's page above Ernie's, because Bert's incoming link has greater “authority” than Ernie's.
This principle is all well and good, but in its present form it is useless to search engines. How can a computer automatically determine that Alice Waters is a greater authority on scrambled eggs than me? Here is an idea that might help: let's combine the hyperlink trick with the authority trick. All pages start off with an authority score of 1, but if a page has some incoming links, its authority is calculated by adding up the authority of all the pages that point to it. In other words, if pages X and Y link to page Z, then the authority of Z is just the authority of X plus the authority of Y.
The figure on the next page gives a detailed example, calculating authority scores for the two scrambled egg recipes. The final scores are shown in circles. There are two pages that link to my home page; these pages have no incoming links themselves, so they get scores of 1. My home page gets the total score of all its incoming links, which adds up to 2. Alice Waters's home page has 100 incoming links that each have a score of 1, so she gets a score of 100. Ernie's recipe has only one incoming link, but it is from a page with a score of 2, so by adding up all the incoming scores (in this case there is only one number to add), Ernie gets a score of 2. Bert's recipe also has only one incoming link, valued at 100, so Bert's final score is 100. And because 100 is greater than 2, Bert's page gets ranked above Ernie's.
A simple calculation of “authority scores” for the two scrambled egg recipes. The authority scores are shown in circles.
THE RANDOM SURFER TRICK
It seems like we have hit on a strategy for automatically calculating authority scores that really works, without any need for a computer to actually understand the content of a page. Unfortunately, there can be a major problem with the approach. It is quite possible for hyperlinks to form what computer scientists call a “cycle.” A cycle exists if you can get back to your starting point just by clicking on hyperlinks.
The figure on the following page gives an example. There are 5 web pages labeled A, B, C , D, and E. If we start at A, we can click through from A to B, and then from B to E—and from E we can click through to A , which is where we started. This means that A, B , and E form a cycle.
It turns out that our current definition of “authority score” (combining the hyperlink trick and the authority trick) gets into big trouble whenever there is a cycle. Let's see what happens on this particular example. Pages C and D have no incoming links, so they get a score of 1. C and D both link to A, so A gets the sum of C and D, which is 1 + 1 = 2. Then B gets the score 2 from A, and E gets 2 from B. (The situation so far is summarized by the left-hand panel of the figure above.) But now A is out of date: it still gets 1 each from C and D, but it also gets 2 from E, for a total of 4. But now B is out of date: it gets 4 from A. But then E needs updating, so it gets 4 from B. (Now we are at the right-hand panel of the figure above.) And so on: now A is 6, so B is 6, so E is 6, so A is 8,…. You get the idea, right? We have to go on forever with the scores always increasing as we go round the cycle.
An example of a cycle of hyperlinks. Pages A , B , and E form a cycle because you can start at A , click through to B , then E , and then return to your starting point at A .
The problem caused by cycles. A , B , and E are always out of date, and their scores keep growing forever.
Calculating authority scores this way creates a chicken-and-egg problem. If we knew the true authority score for A, we could compute the authority