130 likes | 155 Views
Exploiting the Hierarchical Structure for Link Analysis. Advisor : Dr. Hsu Graduate : Kuo-min Wang Authors : Gui-Rong Xue, Qiang Yang, Hua-Jun Zeng, Yong Yu, Zheng Chen. 2005 ACM SIGIR. Outline. Motivation Objective Hierarchical Ranking
E N D
Exploiting the Hierarchical Structure for Link Analysis Advisor : Dr. Hsu Graduate : Kuo-min Wang Authors :Gui-Rong Xue, Qiang Yang, Hua-Jun Zeng, Yong Yu, Zheng Chen 2005 ACM SIGIR .
Outline • Motivation • Objective • Hierarchical Ranking • Experimental Results • Conclusions • Personal Opinions
Motivate • Current link analysis algorithms generally work on a flat link graph • Ignoring the hierarchical structure of the Web graph • Suffering from two problem • The sparsity of link graph • Biased ranking of newly-emerging pages
Objective • Propose a novel ranking algorithm to solved these two problem.
Intra-links Inter-links Hierarchical Ranking • Web pages in the directory are calledintra-links. Hyperlink that link two Web pages in different super nodes are called the inter-links. • Inlink that link to the page and outlink that links from the page. • Upper-layer • Contains m vertices called supernodes, • Super node are linked to each other using directed edges called superedges. • Lower-layer Super node
j 1 5 2 6 4 3 i HierarchicalRanking (cont.) • Hierarchical Random Walk Model • A user seeks for information by starting from the upper-layer and either random jump to another upper-layer node • Or follows the hierarchical links down to the lower-layer. • Calculates the importance of supernodes 0.3 0.7 0.1 0.1 0.1 0.5 0.2 0.1
Hierarchical Ranking (cont.) • Calculating Page Importance • Constructing Weighted Tree Structure • Calculating Page Importance by DHC
q q q p (hub) pq i q 3 1 6 4 2 5 q p q p (hub) Previous Work on Link Analysis • HITS • Page Rank • Collect a root set of pages • Expand to a base set (1k-5k pages) • Set the initial a and h weights • Update a and h weights • Output a list of highest weights qp
Experimental Results • TREC .GOV DataSet • 1,053,372 text/html file, which are used in our experiment • There totally 7,569,353 hyperlinks • Experimental Methods • PageRank, WeightRank, Two-Layer PageRank, • Block-Level PageRank, Hierarchical Ranking
Conclusion • We proposed a Hierarchical Random Walk Model, which approximates the behaviors of the user’s surfing the Web. • We presented a hierarchical ranking algorithm to calculate the importance of Web pages. • The ranking algorithm can significantly improve the performance of Web search, efficiently alleviate the sparse link problem and assign the reasonable rank to the newly-emerging Web pages. • Future work • Conduct the experiments on the large scale Web collection to evaluate our algorithms.
Personal Opinions • Advantage • Drawback • Futurework