1 / 11

Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg

Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran. Basic Idea R is grown to a set S so that it contains a rich amount of authoritative pages. Include any page to S that is pointed to by a page in R.

ulric
Download Presentation

Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Authoritative Sources in a Hyperlinked Environment • Jon M. Kleinberg • ACM-SIAM Symposium,1998 • Krishna Venkateswaran

  2. Basic Idea • R is grown to a set S so that it contains a rich amount of authoritative pages. Include any page to S that is pointed to by a page in R. R- Root set S contains t results. R S- Base set generated from algorithm. ‘S’ is used to determine the hubs and authorities.

  3. Algorithm • Get a set of results for a query string from a text based search query. • Take the top ‘t’ results out of it and put it in a set R. • For every page in set R, • Add all the pages that the page points to into the set R. • Add a maximum of d pages that points to the page, into the set R. • The new result set is named S. Result returned: Base set S out of which we compute the top authorities and hubs.

  4. Heuristics To determine what pages to add to the set S. • Heuristic 1: Avoiding navigational links. • Transverse links: links that are between pages with different domain names. • Intrinsic links (navigational links): links that are between pages within a domain. • Delete all intrinsic links. • Heuristic 2: Avoiding Mass endorsements. • Mass endorsements: A large number of pages in a domain pointing to a single page. • Example: “This site is designed by …” and a link. • Eliminate this by setting a parameter m and allowing only m pages from a single domain to point to a page.

  5. Computing Hubs and Authorities • Extracting authorities from the overall collection of pages, through an analysis of the link structure of G. • Good hub points to many good authorities and a good authority is pointed to by many good hubs. Hubs Authorities unrelated page of large in-degree

  6. Basic Idea • Each page p has a non negative authority weight and non negative hub weight. • If p points to pages with large authority weight values then the page has a large hub weight value. • If p is pointed to by pages with large hub weight values then the page has a large authority weight value. • Pages with higher weights are better authorities and hubs.

  7. Basic Operations • I operation: • Authority weight of a page= Sum of all hub weights of pages pointing to the page. • O operation: • Hub weight of a page= Sum of all authority weights of pages, this page points to. • I and O reinforce each other. • Normalization: The values of the hub and authority weights are divided with a value so that the squares of the sum doesn’t exceed 1.

  8. Contd... q1 q1 q2 y[p]=sum of all x[q]. page p page p q2 x[p]=sum of all y[q] q3 q3 Operation I Operation O Decision on when to stop the reinforcing process. • Apply I and O operations alternatively until a fixed point is reached. • Choose a specific parameter ‘k’ and iterate the process only to k number of times.

  9. Algorithm • Given the set of pages in the form of a graph, set an integer value for parameter k. • k is the number of time the iteration occurs. • Repeat the following process k times. • Apply the I operation to a page and update its new authority weight. • Apply the O operation to a page and update its hub weight. • Normalize both the authority weight and the hub weight. • Return the graph with the new authority weight and hub weight for each page.

  10. Observations • The top authorities and hubs are determined by finding the pages containing the top ‘c’ values for x and y from the graph resulted from the Iterate algorithm. • The Iterate procedure converges to fixed points x* and y* as k increases arbitrarily. • Proved using principal eigenvectors. • Iterate algorithm results in densely linked collection of pages- rich in relevant pages. • Most relevant collection of pages is the densest graph.

  11. Results (java) Authorities .328 http://www.gamelan.com/ Gamelan .251 http://java.sun.com/JavaSoft Home Page .190 http://www.digitalfocus.com/digitalfocus/faq/howdoi.html The Java Developer: HowDoI .190 http://lightyear.ncsa.uiuc.edu/srp/java/javabooks.htmlThe Java Book (\search engines") Authorities .346 http://www.yahoo.com/ Yahoo! .291 http://www.excite.com/ Excite .231 http://www.lycos.com/ Lycos Home Page .231 http://www.altavista.digital.com/ AltaVista: Main Page (Gates) Authorities .643 http://www.roadahead.com/ Bill Gates: The Road Ahead .458 http://www.microsoft.com/ Welcome to Microsoft .440 http://www.microsoft.com/corpinfo/bill-g.htm • It was observed that the www.roadahead.com was the only site that was present in R initially. • This supports the algorithm because many of the pages don’t contain the search query in them.

More Related