330 likes | 501 Views
EigenRank : A Ranking-Oriented Approach to Collaborative Filtering. Nathan N. Liu & Qiang Yang. SIGIR 2008. IDS Lab. Seminar Spring 2009. May 21 st , 2009. 강 민 석. Center for E -Business Technology Seoul National University Seoul, Korea. minsuk@europa.snu.ac.kr. Contents.
E N D
EigenRank:A Ranking-Oriented Approach to Collaborative Filtering Nathan N. Liu & Qiang Yang SIGIR 2008 IDS Lab. Seminar Spring 2009 May 21st, 2009 강 민 석 Center for E-Business Technology Seoul National University Seoul, Korea minsuk@europa.snu.ac.kr
Contents • Introduction • Related Work • Rating Oriented Collaborative Filtering • Ranking Oriented Collaborative Filtering • Experiments • Conclusions
Introduction • Recommender Systems • Content-based filtering • Analyze content information associated with items and users • E.g. product descriptions, user profiles, etc. • Represent users and items using a set of features • Collaborative filtering • NOT require content information about items • Assumption that a user is interested in items preferred by other similar users Content-based filtering collaborative filtering
Introduction • Collaborative Filtering Application Scenario • Rating prediction • one individual item at a time with a predicted rating • Top-N recommended items • an ordered list of top-N recommended items Rating Prediction (MovieLens) Top-N List (Amazon)
Introduction • Motivation • In most CF, adopt rating-oriented approach • predict potential ratings first, then rank them • Higher accuracy in rating predictiondoes NOT necessarily lead to better ranking effectiveness • Example • Same error for two prediction algorithm, but for “predicted 2”, predicted ranking is incorrect • Mostexisting methods predict ratingwithout considering user’s preferences regarding pair of items
Introduction • Overview • Ranking-oriented Approach to CF • directly address item ranking problem • Without inter-mediate step of rating prediction • Contribution • Similarity measure for two user’s rankings • Kendall rank correlation coefficient • Methods for producing item rankings • Greedy order algorithm, Random walk model Rating prediction Rank items
Contents • Introduction • Related Work • Neighborhood-based Approach • Model-based Approach • Rating Oriented Collaborative Filtering • Ranking Oriented Collaborative Filtering • Experiments • Conclusions
Neighborhood-based Approach • User-based Model • Estimate unknown ratings of a target user • based on ratings of neighboring users by using user-user similarity • Difficulties in User-based Model • Raw ratings may contain biases • E.g. Some tends to give high ratings. • Use user-specific means • User-item ratings data is sparse • dimensionality reduction • data-smoothing methods
Neighborhood-based Approach • Item-based Model • similar, but use item-item similarity • Less sensitive to sparsity problem • # of items < # of users • Higher accuracy while allowing more efficient computations • Sarwar et al., 2001 Item-based model (Amazon)
Model-based Approach • Model-based Approach • Use observed user-item ratings to train a compact model • Rating prediction via the model instead of directly manipulating data • Algorithms • Clustering methods • Aspect models • Bayesian networks • Learning to Rank • Rank items represented in some feature space • Methods Try to • Learn an item scoring function • Learn a classifier for classifying item pairs
Contents • Introduction • Related Work • Rating Oriented Collaborative Filtering • Similarity Measure • Rating Prediction • Ranking Oriented Collaborative Filtering • Experiments • Conclusions
Rating-based Similarity Measures • Pearson Correlation Coefficient • Similarity between two users • normalize ratings using average • Vector Similarity • Another way of user-user similarity • view each user as a vector • cosine of the angle between two vectors • Item-Item similarity • Adjusted cosine similarity most effective
Rating Prediction • User-based Model • select a set of k most similar users • compute weighted average of ratings • Item-based Model • similar to user-based model • Set of k items most similar to i
Contents • Introduction • Related Work • Rating Oriented Collaborative Filtering • Ranking Oriented Collaborative Filtering • Similarity Measure – Kendall Rank Correlation Coefficient • Preference Functions – Greedy Order & Random Walk Model • Experiments • Conclusions
Similarity Measure • Motivation • PCC and VS are rating-based measures • In ranking-based, similarity is determined by users’ preferences over items. • E.g. for user 1 and 2, rating values are different, but preferences are very close. • Kendall Rank Correlation Coefficient 2 different preference 2 same preference
Preference Functions • Modeling a user’s preference function • Given two items i and j, which item is more preferable and how much? • means item i is more preferable • indicates the strength of preference • Characteristics • For same item : • Anti-symmetric : • NOT transitive : do not imply
Preference Functions • Derive Preference Function • Key challenge is to get preference that have NOT been rated. • Use the same idea of neighborhood-based CF • Find the set of neighbors of target user who have rated both items 17
Preference Functions • Produce Ranking • Given preference function, we want to get a ranking of items. • Ranking that agree with pairwise preferences as much as possible • Ranking • ρ : ranking of item in item set I • : item i is ranked higher than j • Value function • How ρ is consistent with the preference function Ψ • Our goal is to find that maximizes value function • Optimal solution • NP-Complete problem : Use Greedy algorithm
Greedy Order Algorithm • Motivation • Find an approximately optimal ranking • Algorithm • Input : item set I, preference function Ψ • Output : ranking • Complexity is O(n2), more than half of optimal potential valuehigher when more items less preferred than i find highest ranked item remove highest one,then iterate
Random Walk Model for Item Ranking • Random Walk based on User Preferences • Motivation • some rated i > j, others rated j > k, but only few rated all three i, j, k • want to infer preference between i and k (implicit relationships) • Use multi-step random walks • Markov chain model • Google PageRank • Random walk on Web pages based on hyperlink • Surfer randomly pick hyperlink • Stationary distribution used to PageRank • Model for item ranking • Similarly, there are implicit links between two items • less preferred item jlink to more preferred item i • transitional probability • Stationary distribution used to item ranking • At each step the system may change its state from the current state to another state according to a probability distribution. The changes of state are called transitions …(Wikipedia) link page page preference item item
Random Walk Model for Item Ranking • Random Walk based on User Preferences • Transitional probability • Probability of switching current item i to another item j • higher for items that are more preferred than i • depend on user’s preference function Why exp function? non-negative
Random Walk Model for Item Ranking • Compute the Item Rankings • Think of PageRank algorithm you may know • We can use matrix notations • P : transition matrix • entry : transition probability • : probability of being at item iafter t walking steps • define • get these probabilities using power iteration method for solving eigenvector • Stationary probabilities • It works? • Existence and uniqueness guaranteed iffP is irreducible • entries of P are all non-negative
Random Walk Model for Item Ranking • Personalization Vector (teleport) • To avoid the reducibility of the stochastic matrix (Brin and Page, 1998) • Revised transition matrix • PageRank • Web surfer sometimes “teleport” to other pages. • Teleport according to probability distribution defined by personalization vector v • ε controls how often surfer teleport rather than following hyperlinks. • Our model • similar idea to define personalization vector • Teleport to items with high ratings more often • Unrated items have equal probabilities
Contents • Introduction • Related Work • Rating Oriented Collaborative Filtering • Ranking Oriented Collaborative Filtering • Experiments • Conclusions
Experiments • Issues • Is ranking-oriented approach better than rating-oriented? • Which is better, greedy order algorithm and random walk model? • Is the ranking-oriented similarity measure (Kendall’s) more effective? 3 1 2
Experiments • Data Sets • Two Movie ratings data sets • EachMovie and Netflix • Users • rate >40 different movies • 10,000 for training • 100 for parameter tuning • 500 for testing • Evaluation Protocol • For each user in the test set, • 50% for model construction • 50% for hold-out data for evaluation
Evaluation Metric • Which metric to use? • Rating-oriented CF • MAE (Mean Absolute Error) and RMSE (Root Mean Square Error) • Focus on difference between true rating and predicted rating • Ranking-oriented CF • Our emphasis is on improving item rankings. • NDCG (Normalized Discounted Cumulative Gain) • Evaluate over the top-k items on ranked list discounting factor Increase with position in ranking
Impact of Parameters • Impact of Neighborhood Size • size of neighborhood affect performance • Result • When neighbor size ↑, NDCG ↑ until 100because given more neighbors, preference function more accurate • But, start to decrease when exceed 100, due to many non-similar users
Impact of Parameters • Impact of ε • How often “teleport” operation affect performance? • Result • When ε ↑, NDCG ↑ • But, NOT too big (0.8~0.9)
Comparisons with Other Algorithms • Issues • Is ranking-oriented approach better than rating-oriented? • Which is better, greedy order algorithm and random walk model? • Is the ranking-oriented similarity measure (Kendall’s) more effective? • Comparison • 4 rating oriented settings, 6 ranking oriented settings
Comparisons with Other Algorithms • Result • Ranking-oriented is better than rating-oriented about 8.8% for NDCG1 • Random walk model outperformed all the rating-oriented • Random walk model is little better than greedy order • Kendall rank correlation coefficient is more effective for ranking-oriented
Conclusion • Ranking-oriented Framework for CF • Item ranking w/o rating prediction as intermediate step • Extend neighborhood-based CF by identifying preferences • Two methods for computing item ranking • Greedy order algorithm • Random walk model Greedy order Similarity measure Preference function Random walk model Kendall rank corr. coeff.
Clustering the Tagged Web Thank you~