1 / 10

Exploiting the Hierarchical Structure for Link Analysis

Exploiting the Hierarchical Structure for Link Analysis. Gui-Rong Xue, Qiang Yang, Hua-Jun Zeng, Yong Yu, Zheng Chen Presented by: Xiaoguang Qi. Introduction. Existing link analysis algorithms often suffer from two problems Sparsity of link graph Biased-ranking of newly-emerging pages

ava-holt
Download Presentation

Exploiting the Hierarchical Structure for Link Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting the Hierarchical Structure for Link Analysis Gui-Rong Xue, Qiang Yang, Hua-Jun Zeng, Yong Yu, Zheng Chen Presented by: Xiaoguang Qi

  2. Introduction • Existing link analysis algorithms often suffer from two problems • Sparsity of link graph • Biased-ranking of newly-emerging pages • Incorporate the inherent hierarchical structure of the web into link analysis to deal with these problems intro

  3. Sketch of Hierarchical Ranking Algorithm • Web pages are aggregated based on their hierarchical structure at directory, host or domain level • Link analysis if performed on the aggregated graph • The importance of each node on the aggregated graph is distributed to individual pages belong to the node sketch

  4. Two-Layer Hierarchical Graph • Upper-layer graph • Partition the page set on a certain level • One supernode for each partition • Edges between supernodes are weighted • Weight (SiSj) = # links from pages in Si to pages in Sj • Lower-layer graph • All the pages within a supernode are organized in a hierarchical structure based on the URL relationship graph

  5. Hierarchical Random Walk Model • Surf on the lower-layer graph • Go to another page within current supernode • Surf on the upper-layer graph • Follow a link originated from current supernode • Jump to a random supernode random walk

  6. Calculating Supernode Importance • Supernode importance • In matrix form supernode

  7. Calculating Page Importance • Constructing weighted tree structure • Calculating page importance by DHC page

  8. Parameter Tuning • Aggregation level • Host level aggregation is the best choice • Parameter tuning • θ=0.6 • α=0.6 • β=0.4 • γ=0.8 tuning

  9. Experimental Results • Hierarchical ranking algorithm consistently outperforms other well-known ranking algorithms • BM2500, BlockRank, PageRank, LayerRank, WeightedRank, HostRank • Ranking on sparse data • Effectively alleviate the sparse link problem result

  10. Experimental Results (Cont.) • Ranking of new pages • Aim: to assign reasonable rank to newly-emerging web pages • Test in an analogous way • Test set: 10,000 pages randomly selected with different rank values • Remove 90% of their incoming links • Perform algorithms on the modified graph

More Related