130 likes | 141 Views
This presentation discusses the problem of query on popular topics and proposes a solution using non-affiliated experts to rank the most authoritative pages. It covers technical terms such as expert recommendation, expert lookup, detecting host affiliation, expert selection, expert indexing, target ranking, computing expert score, computing target score, and more. The evaluation and conclusion highlight the characteristics of popular queries and the comparison between Hilltop and PageRank algorithms.
E N D
CSE 450 – Web Mining SeminarProfessor Brian D. DavisonFall 2005 A Presentation on When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics K. Bharat & G. A. Mihaila WWW10 Conference, May 2001, Hong Kong by Osama Ahmed Khan 10/06/2005
Problem • Query on Popular Topic • Content Analysis Solution • Most Authoritative Pages
Technical Terms • Expert • Recommendation • Non-affiliation
Hilltop Algorithm • Expert Lookup • Detecting Host Affiliation • Expert Selection • Expert Indexing • Target Ranking • Computing Expert Score • Computing Target Score
Detecting Host Affiliation • Conditions • Same first 3 octets of IP 127.0.0.1 127.0.0.15 • Same rightmost non-generic token of hostname www.ibm.com www.ibm.co.mx • Union-Find Algorithm
Expert Selection • Retrieve all webpages with: Out-degree > Threshold (k) (e.g. k = 5) • Expert will have: URLs pointing to k distinct non-affiliated hosts
Expert Indexing • Inverted Index • Mapping Keywords to Experts • Key Phrases • Match Positions
Computing Expert Score • Condition • Atleast 1 URL with all query keywords • Expert Score: (S0, S1, S2) Si = SUM{key phrases p with k-i query terms} * LevelScore(p) * FullnessFactor(p,q) Expert_Score = 232 * S0 + 216 * S1 + S2
Computing Target Score • Condition • Atleast 2 non-affiliated experts • Target Score: Edge_Score(E,T) = Expert_Score(E) * SUM{query keywords w} * occ(k,T) Target_Score = Sum{Edge_Score(E,T)}
Evaluation • Locating Specific Popular Targets
Evaluation (Contd.) • Gathering Relevant Pages
Conclusion • Characteristics • Popular Queries • Expert Subset • Hilltop vs. • PageRank • Topic Distillation