130 likes | 155 Views
Exploiting the hierarchical structure of the web graph, a novel ranking algorithm is proposed to overcome the sparsity and biased ranking issues. The algorithm utilizes intra-links and inter-links in the directory, distinguishing upper-layer supernodes and lower-layer pages. Implementing a Hierarchical Random Walk Model, importance of supernodes and page ranking is calculated. The methodology significantly improves web search performance by addressing sparse link problems and accurately ranking new pages. Experimental results with TREC.GOV DataSet show the algorithm's effectiveness. Future work includes large-scale web collection experiments to further evaluate the algorithms.
E N D
Exploiting the Hierarchical Structure for Link Analysis Advisor : Dr. Hsu Graduate : Kuo-min Wang Authors :Gui-Rong Xue, Qiang Yang, Hua-Jun Zeng, Yong Yu, Zheng Chen 2005 ACM SIGIR .
Outline • Motivation • Objective • Hierarchical Ranking • Experimental Results • Conclusions • Personal Opinions
Motivate • Current link analysis algorithms generally work on a flat link graph • Ignoring the hierarchical structure of the Web graph • Suffering from two problem • The sparsity of link graph • Biased ranking of newly-emerging pages
Objective • Propose a novel ranking algorithm to solved these two problem.
Intra-links Inter-links Hierarchical Ranking • Web pages in the directory are calledintra-links. Hyperlink that link two Web pages in different super nodes are called the inter-links. • Inlink that link to the page and outlink that links from the page. • Upper-layer • Contains m vertices called supernodes, • Super node are linked to each other using directed edges called superedges. • Lower-layer Super node
j 1 5 2 6 4 3 i HierarchicalRanking (cont.) • Hierarchical Random Walk Model • A user seeks for information by starting from the upper-layer and either random jump to another upper-layer node • Or follows the hierarchical links down to the lower-layer. • Calculates the importance of supernodes 0.3 0.7 0.1 0.1 0.1 0.5 0.2 0.1
Hierarchical Ranking (cont.) • Calculating Page Importance • Constructing Weighted Tree Structure • Calculating Page Importance by DHC
q q q p (hub) pq i q 3 1 6 4 2 5 q p q p (hub) Previous Work on Link Analysis • HITS • Page Rank • Collect a root set of pages • Expand to a base set (1k-5k pages) • Set the initial a and h weights • Update a and h weights • Output a list of highest weights qp
Experimental Results • TREC .GOV DataSet • 1,053,372 text/html file, which are used in our experiment • There totally 7,569,353 hyperlinks • Experimental Methods • PageRank, WeightRank, Two-Layer PageRank, • Block-Level PageRank, Hierarchical Ranking
Conclusion • We proposed a Hierarchical Random Walk Model, which approximates the behaviors of the user’s surfing the Web. • We presented a hierarchical ranking algorithm to calculate the importance of Web pages. • The ranking algorithm can significantly improve the performance of Web search, efficiently alleviate the sparse link problem and assign the reasonable rank to the newly-emerging Web pages. • Future work • Conduct the experiments on the large scale Web collection to evaluate our algorithms.
Personal Opinions • Advantage • Drawback • Futurework