200 likes | 595 Views
Learning to Rank: A Machine Learning Approach to Static Ranking. 049011 - Algorithms for Large Data Sets Student Symposium. Presented by Li-Tal Mashiach. Speaker: Li-Tal Mashiach. References . Learning to Rank Using Gradient Descent ICML, 2005, Burges et al
E N D
Learning to Rank: A Machine Learning Approach to Static Ranking 049011 - Algorithms for Large Data Sets Student Symposium Presented by Li-Tal Mashiach Speaker: Li-Tal Mashiach
References • Learning to Rank Using Gradient Descent ICML, 2005, Burges et al • Beyond PageRank: Machine Learning for Static Ranking WWW 2006, Brill et al ©Li-Tal Mashiach, Technion, 2006
Today’s topics • Motivation & Introduction • RankNet • fRank • Discussion • Future Work suggestion • Predict Popularity Rank (PP-Rank) ©Li-Tal Mashiach, Technion, 2006
Motivation • The Web is growing exponentially in size • The number of incorrect, spamming, and malicious sites is also growing • Having a good static ranking is crucially important • Recent works showed that PageRank may not perform any better than other simple measure on certain tasks ©Li-Tal Mashiach, Technion, 2006
Motivation – Cont. • Combination of many features is more accurate than one feature • PageRank is only link structure feature • It is harder for malicious users to manipulate the ranking in case of machine learning approach ©Li-Tal Mashiach, Technion, 2006
Introduction • Neural networks • Training • Cost function • Gradient Descent ©Li-Tal Mashiach, Technion, 2006
Neural Networks Like the brain, neural network is a massively parallel collection of small and simple processing units where the interconnections form a large part of the network's intelligence. ©Li-Tal Mashiach, Technion, 2006
Training neural network The task is similar to teaching a student • First, show him some examples • After that, ask him to solve some problems • Finally, correct him, and start the whole process again Hopefully, he’ll get it right after a couple of rounds ©Li-Tal Mashiach, Technion, 2006
Training neural network – cont. • Cost function – Error function to minimize • Sum squared error • Cross entropy • Gradient Descent • take the derivative of the cost function with respect to the network parameters • change those parameters in a gradient-related direction ©Li-Tal Mashiach, Technion, 2006
Static ranking as a Classification problem • xi represents a set of features of a Web page i • yi is a rank • The classification problem - learn the function that maps all pages’ features to their rank • But all we really care about is the order of the pages ©Li-Tal Mashiach, Technion, 2006
RankNet • Optimize the order of objects, rather than the values assigned to them • RankNet is given • Collection of pairs of items Z={<xi,yj>} • Target probabilities that Web page i is to be ranked higher than j • RankNet learns the order of the items • Using probabilistic cost function (cross entropy) for training ©Li-Tal Mashiach, Technion, 2006
fRank • Uses RankNet to learn the static ranking function • Training according to human judgments • For each query, rating is assigned manually to a number of results • The rating measures how relevant the result is for the query ©Li-Tal Mashiach, Technion, 2006
fRank – Cont. • Uses set of features from each page: • PageRank • Popularity – number of visits • Anchor text and inlinks – total amount of text in links, number of unique words, etc. • Page – number of words, frequency of the most common term, etc. • Domain – various averages across all pages in the domain – PageRank, number of outlinks, etc. ©Li-Tal Mashiach, Technion, 2006
fRank Results • fRank performs significantly better than PageRank • Page and Popularity feature sets were the most significant contributors • By collecting more popularity data, fRank performance continues to improve ©Li-Tal Mashiach, Technion, 2006
Discussion • The training for static ranking cannot be depend on queries • Using human judgments for static ranking (?) • PageRank advantages • protecting from spams • fRank is not useful for directing the crawl ©Li-Tal Mashiach, Technion, 2006
Future work – PP-Rank • Training the machine to predict popularity of Web Page • Using popularity data for training • Amount of visits • How long users stay in the page • Did they leave by clicking back • … • should be normalized to the pattern of each user ©Li-Tal Mashiach, Technion, 2006
PP-Rank - Advantages • Can predict popularity of pages that were just created (no page points to them yet) • Can be a measure for directing the crawler • The rank will be not according to what web masters find interesting (PageRank), but according to what users find interesting ©Li-Tal Mashiach, Technion, 2006
Summary • Ranking is the key to search engine • Learning-based approach for static ranking is a promising new field • RankNet • fRank • PP-Rank ©Li-Tal Mashiach, Technion, 2006
ANY QUESTIONS? ©Li-Tal Mashiach, Technion, 2006