420 likes | 579 Views
Ranking Problem In the Network. Yuanzhe Cai I nformation T echnology Lab oratory (IT Lab) Computer Science and Engineering Department The University of Texas at Arlington, Arlington, TX 76019. Papers to present.
E N D
Ranking Problem In the Network Yuanzhe Cai Information Technology Laboratory (IT Lab) Computer Science and Engineering Department The University of Texas at Arlington, Arlington, TX 76019
Papers to present • Page, Larry, "PageRank: Bringing Order to the Web", Stanford Digital Library Project, talk. August 18, 1997 (archived 2002) • Kleinberg, Jon (1999). "Authoritative sources in a hyperlinked environment" (PDF). Journal of the ACM46 (5): 604–632.
Talk Outline • Ranking Problems • Voted Rank: Bring Order by the Voters. • Weighted Rank: Set the Different Weights to the Voters. • Learning to Rank: Study a Model to Rank the Object.
Ranking Problems Certain Problem: We know the standard. Every day, we try to do a lot of certain ranking problem: • Which super market has the lowest price of the eggs in Arlington? Aldi Market> New Super Market > Wal-Mart Super Market • Every day, our aim is to design an algorithm to improve current algorithm’s performance. Association Rule FP tree > Partition Methods > Apriori Algorithm
Ranking Problems Uncertain Problem: We do not know the standard. Can you rank the students in our class? Can you rank the basket ball player in the world? Can you rank the web page in the internet?
Voted Rank: Bring Order by the Voters Can you rank the basket ball player in the world? Min Yao Kobe Bryant Iverson Allen McGrady Tracy
Voted Rank: Bring Order by the Voters Four user in our group vote their basket ball star. We have a following ranking result. Yao (2) > Kobe (1) = Iverson (1) > McGrady (0)
Voted Rank: Bring Order by the Voters In 1999, researchers discover that the web page which has a high in-degree will has a high expertise score. The other web page which links to this web page just like give one vote to this web page.
Weighted Rank: Set the Different Weights to the Voters. • Four user in our group vote their basket ball star. Das: Watch NBA about 10 years. (0.5) Cai: Don’t Watch any NBA Games. (0.1) Boat: Watch NBA about 15 years. (0.7) Wu: Who is Yao Min? (0.05) We need to consider how to set the weight to these voters. Rank: Iverson (0.7) Kobe (0.5) Yao (0.1 + 0.05 = 0.15) McGrady 0
Weighted Rank: Set the Different Weights to the Voters. • In 1999, researchers discover that the web page which has a high in-degree will has a high expertise score. • We want to design an algorithm to set the high weight to these high in-degree nodes.
Weighted Rank: Set the Different Weights to the Voters. PageRank is this kind of algorithm. 1. Every Node has the votes score about “1/n”. 2. Every Node give the votes score to their neighbor nodes. 3. Iterate until it converge.
PageRank Calculation V Power Method to calculate PageRank: Input: V0 , While(converge (Vi,Vi-1)) Vi = Vi-1 End While Output: Vi 12
The other way to explain the PageRank • Random Surfer Model. • Assume every nodes has 1 billion users. • These users browse the web page in this network. • After several days, these users will stop to browse the web page. • The user’s distribution describes the PageRank score. PageRank score also describes which web page is popular in future.
Weighted Rank: Set the Different Weights to the Voters. • Cai’s Idea. Das: Watch NBA about 10 years. (0.3) Cai: Don’t Watch any NBA Games. (0.9) Boat: Watch NBA about 15 years. (0.5) Wu: Who is Yao Min? (0.005) We need to consider how to set the weight to these voters. Rank: Yao (0.9+0.005) Iverson (0.5) Kobe (0.3) McGrady 0
PageRank For Personal Rank Because I am very interested in the sports web page, I want to set the high weights to the sports web page. • Find the web page which contains the word “sports”. • Set the high weights for the web page which contains the word “sports”. v is the personal rank vector and (One way) P = αP + (1-α)eeT/n => P = αP + (1-α)veT/n. Run the PageRank algorithm to calculate the results. (The other way) P = αP + (1-α)evT/n
PageRank For Personal Rank Id 1, 4, 7 is the nodes which contains the “sport” web page. (1)P = αP + (1-α)veT/n : The voters are the “sport” web page. (2)P = αP + (1-α)evT/n : The voters are the other web page, but votes to “sport” web page. For my understanding, (1) is the correct way to do it.
How to use this global rank algorithm? Rank the basket ball players during 1990-1995. Rank the basket ball players in Los Angeles Lake. Rank the basket ball players in cai’s favorite players. Character of these questions. 1) Player Scope 2) Global Rank
How to use the PageRank algorithm? Improve the accuracy about search engineer.
How to use the PageRank algorithm? • Calculate the similarity score between query and web page. Usually they use the vector space model to calculate the similarity score. (Scope) • Calculate the PageRank score around the web page network. (Global Rank) • Combine these two ranks list together to return the rank list for each query. Page Lawrence et al said “This is a difficult work and he does not give an equation to combine them together.” (Combination)
Discuss About PageRank If you want to know it, please read this survey paper. A. Langville and C. Meyer. Deeper inside PageRank. Tech. rep., North Carolina State University, 2003. • Convergence Proof. • How to speed the PageRank algorithm? • Decrease the iteration times. • Decrease the time of one iteration. • How to store or calculate the PageRank score in the large scale? ……
HITS algorithm • HITS stands for Hypertext Induced Topic Search. • HITS produces two rankings of the expanded set of pages, authority ranking and hub ranking.
HITS algorithm Authority: A authority is a page with many in-links. • The idea is that the page may have good or authoritative content on some topic and • thus many people trust it and link to it. Hub: A hub is a page with many out-links. • The page serves as an organizer of the information on a particular topic and • points to many good authority pages on the topic.
HITS algorithm • A good hub points to many good authorities, and • A good authority is pointed to by many good hubs. • Authorities and hubs have a mutual reinforcement relationship. Fig. 8 shows some densely linked authorities and hubs (a bipartite sub-graph).
HITS Algorithm • HITS works on the pages in S, and assigns every page in S an authority score and a hub score. • Let the number of pages in S be n. • We again use G = (V, E) to denote the hyperlink graph of S. • We use L to denote the adjacency matrix of the graph.
HITS Algorithm • Let the authority score of the page i be a(i), and the hub score of page i be h(i). • The mutual reinforcing relationship of the two scores is represented as follows: (31) (32)
HITS in matrix form • We use a to denote the column vector with all the authority scores, a = (a(1), a(2), …, a(n))T, and • use h to denote the column vector with all the authority scores(should be ”hub scores” ), h = (h(1), h(2), …, h(n))T, • Then, a = LTh (1) h = La (2) a = LTLa h = LLTh
Motivation Behind the HITS 1 1 b G’ b c a b G 3 1 a a c d L LT d LLT Hub Score: How many backward cycle in the network.
Learning to Rank: Study a Model to Rank the Object. • I also want to rank the player, but I do not believe your rank results. I ask the really experts to rank these players. Kobe(4) Iverson(3) McGrady (1) How about Yao?
Learning to Rank: Study a Model to Rank the Object. How the expert rank the players? They will analyze the features from the players. Study a functions from these features to predict the score. Rank score (Yao) = F(x1,x2,...) = 3.1 Kobe(4) > Yao(3.1) > Iverson(3) > McGrady(1) McGrady
Learning to Rank: Study a Model to Rank the Object. • Extract the features from the players. • Learn a rank model from these features. • When the new player comes, rank the users by this model.
Learning to Rank: Study a Model to Rank the Object. Retrieve Model of Learning to Rank
Web Page Features • PageRank score • Similarity between Query and Web Page • HITS score • The Length of the Web Pages • The Number of URL in the Web Page ……
Learning Model qwi is the ith query web page pair. Example: qw1 < qw2 < qw3 h1 is PageLength(). h2 is QueryPageSim(). For example, QueryPageSim(qw2) is 0.7. h3 is PageRankScore(). h4 is #url(). 34
Learning Model Example: qw1 < qw2 < qw3 qwx< qwy h(qwx) > h(qwy). This is a wrong rank. Function h1(qwi): 2 wrong rank h1(qw1)<h1(qw2), h1(qw3) < h1(qw1), h1(qw3) < h1(qw2) Function h2(qwi): 1 wrong rank h2(qw1)<h2 (qw2), h2(qw1) < h2(qw3), h2(qw3) < h2(qw2) Function h3(qwi): 0 wrong rank h3(qw1)<h3(qw2), h3(qw1) < h3(qw3), h3(qw2) < h3(qw1) h1(x) < h2(x) < h3(x) : Less wrong ranks 35
, Learning to Rank algorithm Objective Function: In the real rank, xi> yi ≈ R(h1) = ½ (max{0, 0.7-0.9+r}2+max{0, 0.7-0.2+r}2 +max{0, 0.9-0.2+r} 2 ) 0.37 R(h2) = ½ (max{0, 10-49+r}2+max{0, 10-20+r}2 +max{0, 49-20+r} 2) 420.5 R(h3) = 0 ≈ 36
Gradient Descent Method Problem: Objective Function: min(ψ(Y (xi) -F(xi))), ψ is the lost function. We use the simple method to study function f1(x). f1(x) can’t match with some data distribution. We only consider these wrong data and calculate the function f2(x). F(xi) = f1(x) + ρ1f2(x) , ρ1=min(ψ(Y (xi) -F(xi))) In the end, F(xi) = f1(x) + ρ1f2(x) + ρ2f3(x) + … How about ρiis equal to a small value η? Regression Problem: 37
f1(x) Objective Function f1(x) + η f2(x) η f2(x) : η is one step length. Gradient Descent Method (Greedy Method) 38
Learning to Rank algorithm Algorithm GBrank: Start with an initial guess h0, for k = 1, 2,…, 1) Using hk-1 as the current approximation of h, we separate S into two disjoint sets, S+ = {<xi, yi>S|hk-1(xi) ≥ hk-1(yi) + r } and S- = {<xi, yi>S|hk-1(xi) < hk-1(yi) + r } 2. Fit a regression function gk(x) using decision tree and the following training data {(xi, hk-1(yi) + r), (yi, hk-1(xi) - r )| (xi , yi) S-}; 3. Form the new ranking function as where γ is a shrinkage factor. Margin: We don’t consider the value in the margin. We only consider the typical type of error pairs. 39
Learning to Rank Algorithm qw1 < qw2 < qw3 First Tree Second Tree h0(x) =WL In S- set, <(20+r, qw3), (qw2,49+r) > g(x) =#URL η = 0.1 R(h0+ ηg(x)) = ½ ( max{0, 10-49+r +0.1*(1-2)}2+ max{0, 10-20+r +0.1*(1-4)}2 + max{0, 49-20+r +0.1*(2-4)}2) 414.72 >0 <0 >0 <0 ≈ R(h1) 420.5 ≈ 40
Conclusion • Discuss the motivation about ranking problems. • Votes score for the rank problems. • Learning model for the rank problems.