1 / 13

Exploiting the Hierarchical Structure for Link Analysis

Exploiting the hierarchical structure of the web graph, a novel ranking algorithm is proposed to overcome the sparsity and biased ranking issues. The algorithm utilizes intra-links and inter-links in the directory, distinguishing upper-layer supernodes and lower-layer pages. Implementing a Hierarchical Random Walk Model, importance of supernodes and page ranking is calculated. The methodology significantly improves web search performance by addressing sparse link problems and accurately ranking new pages. Experimental results with TREC.GOV DataSet show the algorithm's effectiveness. Future work includes large-scale web collection experiments to further evaluate the algorithms.

awilber
Download Presentation

Exploiting the Hierarchical Structure for Link Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting the Hierarchical Structure for Link Analysis Advisor : Dr. Hsu Graduate : Kuo-min Wang Authors :Gui-Rong Xue, Qiang Yang, Hua-Jun Zeng, Yong Yu, Zheng Chen 2005 ACM SIGIR .

  2. Outline • Motivation • Objective • Hierarchical Ranking • Experimental Results • Conclusions • Personal Opinions

  3. Motivate • Current link analysis algorithms generally work on a flat link graph • Ignoring the hierarchical structure of the Web graph • Suffering from two problem • The sparsity of link graph • Biased ranking of newly-emerging pages

  4. Objective • Propose a novel ranking algorithm to solved these two problem.

  5. Intra-links Inter-links Hierarchical Ranking • Web pages in the directory are calledintra-links. Hyperlink that link two Web pages in different super nodes are called the inter-links. • Inlink that link to the page and outlink that links from the page. • Upper-layer • Contains m vertices called supernodes, • Super node are linked to each other using directed edges called superedges. • Lower-layer Super node

  6. j 1 5 2 6 4 3 i HierarchicalRanking (cont.) • Hierarchical Random Walk Model • A user seeks for information by starting from the upper-layer and either random jump to another upper-layer node • Or follows the hierarchical links down to the lower-layer. • Calculates the importance of supernodes 0.3 0.7 0.1 0.1 0.1 0.5 0.2 0.1

  7. Hierarchical Ranking (cont.) • Calculating Page Importance • Constructing Weighted Tree Structure • Calculating Page Importance by DHC

  8. q q q p (hub) pq i q 3 1 6 4 2 5 q p q p (hub) Previous Work on Link Analysis • HITS • Page Rank • Collect a root set of pages • Expand to a base set (1k-5k pages) • Set the initial a and h weights • Update a and h weights • Output a list of highest weights qp

  9. Experimental Results • TREC .GOV DataSet • 1,053,372 text/html file, which are used in our experiment • There totally 7,569,353 hyperlinks • Experimental Methods • PageRank, WeightRank, Two-Layer PageRank, • Block-Level PageRank, Hierarchical Ranking

  10. Experimental Results

  11. Experimental Results(cont.)

  12. Conclusion • We proposed a Hierarchical Random Walk Model, which approximates the behaviors of the user’s surfing the Web. • We presented a hierarchical ranking algorithm to calculate the importance of Web pages. • The ranking algorithm can significantly improve the performance of Web search, efficiently alleviate the sparse link problem and assign the reasonable rank to the newly-emerging Web pages. • Future work • Conduct the experiments on the large scale Web collection to evaluate our algorithms.

  13. Personal Opinions • Advantage • Drawback • Futurework

More Related