Exploiting the Hierarchical Structure for Link Analysis

Exploiting the Hierarchical Structure for Link Analysis Advisor : Dr. Hsu Graduate : Kuo-min Wang Authors :Gui-Rong Xue, Qiang Yang, Hua-Jun Zeng, Yong Yu, Zheng Chen 2005 ACM SIGIR .

Outline • Motivation • Objective • Hierarchical Ranking • Experimental Results • Conclusions • Personal Opinions

Motivate • Current link analysis algorithms generally work on a flat link graph • Ignoring the hierarchical structure of the Web graph • Suffering from two problem • The sparsity of link graph • Biased ranking of newly-emerging pages

Objective • Propose a novel ranking algorithm to solved these two problem.

Intra-links Inter-links Hierarchical Ranking • Web pages in the directory are calledintra-links. Hyperlink that link two Web pages in different super nodes are called the inter-links. • Inlink that link to the page and outlink that links from the page. • Upper-layer • Contains m vertices called supernodes, • Super node are linked to each other using directed edges called superedges. • Lower-layer Super node

j 1 5 2 6 4 3 i HierarchicalRanking (cont.) • Hierarchical Random Walk Model • A user seeks for information by starting from the upper-layer and either random jump to another upper-layer node • Or follows the hierarchical links down to the lower-layer. • Calculates the importance of supernodes 0.3 0.7 0.1 0.1 0.1 0.5 0.2 0.1

Hierarchical Ranking (cont.) • Calculating Page Importance • Constructing Weighted Tree Structure • Calculating Page Importance by DHC

q q q p (hub) pq i q 3 1 6 4 2 5 q p q p (hub) Previous Work on Link Analysis • HITS • Page Rank • Collect a root set of pages • Expand to a base set (1k-5k pages) • Set the initial a and h weights • Update a and h weights • Output a list of highest weights qp

Experimental Results • TREC .GOV DataSet • 1,053,372 text/html file, which are used in our experiment • There totally 7,569,353 hyperlinks • Experimental Methods • PageRank, WeightRank, Two-Layer PageRank, • Block-Level PageRank, Hierarchical Ranking

Experimental Results

Experimental Results(cont.)

Conclusion • We proposed a Hierarchical Random Walk Model, which approximates the behaviors of the user’s surfing the Web. • We presented a hierarchical ranking algorithm to calculate the importance of Web pages. • The ranking algorithm can significantly improve the performance of Web search, efficiently alleviate the sparse link problem and assign the reasonable rank to the newly-emerging Web pages. • Future work • Conduct the experiments on the large scale Web collection to evaluate our algorithms.

Personal Opinions • Advantage • Drawback • Futurework

Exploiting the Hierarchical Structure for Link Analysis

Exploiting the Hierarchical Structure for Link Analysis

Presentation Transcript

Stochastic Approach for Link Structure Analysis (SALSA)

Exploiting Domain Structure for Named Entity Recognition

Link Analysis

THE HIERARCHICAL STRUCTURE OF THE CHURCH

Link Analysis

Exploiting Discourse Structure for Sentiment Analysis of Text

Hierarchical Structure

Tree – a Hierarchical Data Structure

Algorithms Exploiting the Chain Structure of Proteins

Exploiting Structure in Symmetry Detection for CNF

Exploiting Domain Knowledge with a Concurrent Hierarchical Planner

Exploiting Predicate Structure for Efficient Reachability Detection

Link Analysis

Hierarchical Pointer Analysis for Distributed Programs

Link to Structure

Exploiting the Hierarchical Structure for Link Analysis

Link Analysis

Hierarchical Task analysis

Link Analysis

Link Analysis

Hierarchical Pointer Analysis for Distributed Programs

Link Analysis