100 likes | 288 Views
Exploiting the Hierarchical Structure for Link Analysis. Gui-Rong Xue, Qiang Yang, Hua-Jun Zeng, Yong Yu, Zheng Chen Presented by: Xiaoguang Qi. Introduction. Existing link analysis algorithms often suffer from two problems Sparsity of link graph Biased-ranking of newly-emerging pages
E N D
Exploiting the Hierarchical Structure for Link Analysis Gui-Rong Xue, Qiang Yang, Hua-Jun Zeng, Yong Yu, Zheng Chen Presented by: Xiaoguang Qi
Introduction • Existing link analysis algorithms often suffer from two problems • Sparsity of link graph • Biased-ranking of newly-emerging pages • Incorporate the inherent hierarchical structure of the web into link analysis to deal with these problems intro
Sketch of Hierarchical Ranking Algorithm • Web pages are aggregated based on their hierarchical structure at directory, host or domain level • Link analysis if performed on the aggregated graph • The importance of each node on the aggregated graph is distributed to individual pages belong to the node sketch
Two-Layer Hierarchical Graph • Upper-layer graph • Partition the page set on a certain level • One supernode for each partition • Edges between supernodes are weighted • Weight (SiSj) = # links from pages in Si to pages in Sj • Lower-layer graph • All the pages within a supernode are organized in a hierarchical structure based on the URL relationship graph
Hierarchical Random Walk Model • Surf on the lower-layer graph • Go to another page within current supernode • Surf on the upper-layer graph • Follow a link originated from current supernode • Jump to a random supernode random walk
Calculating Supernode Importance • Supernode importance • In matrix form supernode
Calculating Page Importance • Constructing weighted tree structure • Calculating page importance by DHC page
Parameter Tuning • Aggregation level • Host level aggregation is the best choice • Parameter tuning • θ=0.6 • α=0.6 • β=0.4 • γ=0.8 tuning
Experimental Results • Hierarchical ranking algorithm consistently outperforms other well-known ranking algorithms • BM2500, BlockRank, PageRank, LayerRank, WeightedRank, HostRank • Ranking on sparse data • Effectively alleviate the sparse link problem result
Experimental Results (Cont.) • Ranking of new pages • Aim: to assign reasonable rank to newly-emerging web pages • Test in an analogous way • Test set: 10,000 pages randomly selected with different rank values • Remove 90% of their incoming links • Perform algorithms on the modified graph