470 likes | 747 Views
A Review of Information Filtering Part II: Collaborative Filtering. Chengxiang Zhai Language Technologies Institiute School of Computer Science Carnegie Mellon University. Outline. A Conceptual Framework for Collaborative Filtering (CF) Rating-based Methods (Breese et al. 98)
E N D
A Review of Information FilteringPart II: Collaborative Filtering Chengxiang Zhai Language Technologies Institiute School of Computer Science Carnegie Mellon University
Outline • A Conceptual Framework for Collaborative Filtering (CF) • Rating-based Methods (Breese et al. 98) • Memory-based methods • Model-based methods • Preference-based Methods (Cohen et al. 99 & Freund et al. 98) • Summary & Research Directions
What is Collaborative Filtering (CF)? • Making filtering decisions for an individual user based on the judgments of other users • Inferring individual’s interest/preferences from that of other similar users • General idea • Given a user u, find similar users {u1, …, um} • Predict u’s preferences based on the preferences of u1, …, um
CF: Applications • Recommender Systems: books, CDs, Videos, Movies, potentially anything! • Can be combined with content-based filtering • Example (commercial) systems • GroupLens (Resnick et al. 94): usenet news rating • Amazon: book recommendation • Firefly (purchased by Microsoft?): music recommendation • Alexa: web page recommendation
CF: Assumptions • Users with a common interest will have similar preferences • Users with similar preferences probably share the same interest • Examples • “interest is IR” => “read SIGIR papers” • “read SIGIR papers” => “interest is IR” • Sufficiently large number of user preferences are available
CF: Intuitions • User similarity • If Jamie liked the paper, I’ll like the paper • ? If Jamie liked the movie, I’ll like the movie • Suppose Jamie and I viewed similar movies in the past six months … • Item similarity • Since 90% of those who liked Star Wars also liked Independence Day, and, you liked Star Wars • You may also like Independence Day
Collaborative Filtering vs. Content-based Filtering • Basic filtering question: Will user U like item X? • Two different ways of answering it • Look at what U likes • Look at who likes X • Can be combined => characterize X => content-based filtering => characterize U => collaborative filtering
Rating-based vs. Preference-based • Rating-based: User’s preferences are encoded using numerical ratings on items • Complete ordering • Absolute values can be meaningful • But, values must be normalized to combine • Preferences: User’s preferences are represented by partial ordering of items • Partial ordering • Easier to exploit implicit preferences
A Formal Framework for Rating Objects: O o1 o2 … oj … on 3 1.5 …. … 2 2 1 3 Users: U Xij=f(ui,oj)=? u1 u2 … ui ... um The task • Assume known f values for some (u,o)’s • Predict f values for other (u,o)’s • Essentially function approximation, like other learning problems ? Unknown function f: U x O R
Where are the intuitions? • Similar users have similar preferences • If u u’, then for all o’s, f(u,o) f(u’,o) • Similar objects have similar user preferences • If o o’, then for all u’s, f(u,o) f(u,o’) • In general, f is “locally constant” • If u u’ and o o’, then f(u,o) f(u’,o’) • “Local smoothness” makes it possible to predict unknown values by interpolation or extrapolation • What does “local” mean?
Two Groups of Approaches • Memory-based approaches • f(u,o) = g(u)(o) g(u’)(o) if u u’ • Find “neighbors” of u and combine g(u’)(o)’s • Model-based approaches • Assume structures/model: object cluster, user cluster, f’ defined on clusters • f(u,o) = f’(cu, co) • Estimation & Probabilistic inference
Memory-based Approaches (Breese et al. 98) • General ideas: • Xij: rating of object j by user i • ni: average rating of all objects by user i • Normalized ratings: Vij = Xij - ni • Memory-based prediction • Specific approaches differ in w(a,i) -- the distance/similarity between user a and i
User Similarity Measures • Pearson correlation coefficient (sum over commonly rated items) • Cosine measure • Many other possibilities!
Improving User Similarity Measures (Breese et al. 98) • Dealing with missing values: default ratings • Inverse User Frequency (IUF): similar to IDF • Case Amplification: use w(a,I)p, e.g., p=2.5
Model-based Approaches (Breese et al. 98) • General ideas • Assume that data/ratings are explained by a probabilistic model with parameter • Estimate/learn model parameter based on data • Predict unknown rating using E [xk+1 | x1, …, xk], which is computed using the estimated model • Specific methods differ in the model used and how the model is estimated
Probabilistic Clustering • Clustering users based on their ratings • Assume ratings are observations of a multinomial mixture model with parameters p(C), p(xi|C) • Model estimated using standard EM • Predict ratings using E[xk+1 | x1, …, xk]
Bayesian Network • Use BN to capture object/item dependency • Each item/object is a node • (Dependency) structure is learned from all data • Model parameters: p(xk+1 |pa(xk+1)) where pa(xk+1) is the parents/predictors of xk+1 (represented as a decision tree) • Predict ratings using E[xk+1 | x1, …, xk]
Three-way Aspect Model(Popescul et al. 2001) • CF + content-based • Generative model • (u,d,w) as observations • z as hidden variable • Standard EM • Essentially clustering the joint data • Evaluation on ResearchIndex data • Found it’s better to treat (u,w) as observations
Evaluation Criteria (Breese et al. 98) • Rating accuracy • Average absolute deviation • Pa = set of items predicted • Ranking accuracy • Expected utility • Exponentially decaying viewing probabillity • ( halflife )= the rank where the viewing probability =0.5 • d = neutral rating
Results - BN & CR+ are generally better than VSIM & BC - BN is best with more training data - VSIM is better with little training data - Inverse User Freq. Is effective - Case amplification is mostly effective
Summary of Rating-based Methods • Effectiveness • Both memory-based and model-based methods can be effective • The correlation method appears to be robust • Bayesian network works well with plenty of training data, but not very well with little training data • The cosine similarity method works well with little training data
Summary of Rating-based Methods (cont.) • Efficiency • Memory based methods are slower than model-based methods in predicting • Learning can be extremely slow for model-based methods
Preference-based Methods(Cohen et al. 99, Freund et al. 98) • Motivation • Explicit ratings are not always available, but implicit orderings/preferences might be available • Only relative ratings are meaningful, even if when ratings are available • Combining preferences has other applications, e.g., • Merging results from different search engines
A Formal Model of Preferences • Instances: O={o1,…, on} • Ranking function: R: (U x) O x O [0,1] • R(u,v)=1 means u is strongly preferred to v • R(u,v)=0 means v is strongly preferred to u • R(u,v)=0.5 means no preference • Feedback: F = {(u,v)}, u is preferred to v • Minimize Loss: Hypothesis space
The Hypothesis Space H • Without constraints on H, the loss is minimized by any R that agrees with F • Appropriate constraints for collaborative filtering • Compare this with
The Hedge Algorithm for Combining Preferences • Iterative updating of w1, w2, …, wn • Initialization: wi is uniform • Updating: [0,1] • L=0 => weight stays • L is large => weight is decreased
Some Theoretical Results • The cumulative loss of Ra will not be much worse than that of the best ranking expert/feature • Preferences Ra => ordering => R L(R,F) <= DISAGREE(,Ra)/|F| + L(Ra,F) • Need to find that minimizes disagreement • General case: NP-complete
A Greedy Ordering Algorithm • Use weighted graph to represent preferences R • For each node, compute the potential value, I.e., outgoing_weights - ingoing_weights • Rank the node with the highest potential value above all others • Remove this node and its edges, repeat • At least half of the optimal agreement is guaranteed
Improvement • Identify all the strongly connected components • Rank the components consistently with the edges between them • Rank the nodes within a component using the basic greedy algorithm
Evaluation of Ordering Algorithms • Measure: “weight coverage” • Datasets = randomly generated small graphs • Observations • The basic greedy algorithm works better than a random permutation baseline • Improved version is generally better, but the improvement is insignificant for large graphs
Metasearch Experiments • Task: Known item search • Search for a ML researchers’ homepage • Search for a university homepage • Search expert = variant of query • Learn to merge results of all search experts • Feedback • Complete : known item preferred to all others • Click data : known item preferred to all above it • Leave-one-out testing
Metasearch Results • Measures: compare combined preferences with individual ranking function • sign test: to see which system tends to rank the known relevant article higher. • #queries with the known relevant item ranked above k. • average rank of the known relevant item • Learned system better than individual expert by all measure (not surprising, why?)
Direct Learning of an Ordering Function • Each expert is treated as a ranking feature fi: O R U {0} (allow partial ranking) • Given preference feedback : X x X R • Goal: to learn H that minimizes the loss • D (x0,x1): a distribution over X x X(actually a uniform dist. over pairs with feedback order) D (x0,x1) = c max{0, (x0,x1) }
The RankBoost Algorithm • Iterative updating of D(x0,x1) • Initialization: D1= D • For t=1,…,T: • Train weak learner using Dt • Get weak hypothesis ht: X R • Choose t >0 • Update • Final hypothesis:
How to Choose t and Design ht ? • Bound on the ranking loss • Thus, we should choose t that minimizes the bound • Three approaches: • Numerical search • Special case: h is either 0 or 1 • Approximation of Z, then find analytic solution
Efficient RankBoost for Bipartite Feedback X0 Bipartite feedback: Essentially binary classification X1 Complexity at each round: O(|X0||X1|) O(|X0|+|X1|)
Evaluation of RankBoost • Meta-search: Same as in (Cohen et al 99) • Perfect feedback • 4-fold cross validation
EachMovie Evaluation # users #movies/user #feedback movies
Summary • CF is “easy” • The user’s expectation is low • Any recommendation is better than none • Making it practically useful • CF is “hard” • Data sparseness • Scalability • Domain-dependent
Summary (cont.) • CF as a Learning Task • Rating-based formulation • Learn f: U x O -> R • Algorithms • Instance-based/memory-based (k-nearest neighbors) • Model-based (probabilistic clustering) • Preference-based formulation • Learn PREF: U x O x O -> R • Algorithms • General preference combination (Hedge), greedy ordering • Efficient restricted preference combination (RankBoost)
Summary (cont.) • Evaluation • Rating-based methods • Simple methods seem to be reasonably effective • Advantage of sophisticated methods seems to be limited • Preference-based methods • More effective than rating-based methods according to one evaluation • Evaluation on meta-search is weak
Research Directions • Exploiting complete information • CF + content-based filtering + domain knowledge + user model … • More “localized” kernels for instance-based methods • Predicting movies need different “neighbor users” than predicting books • Suggesting using items similar to the target item as features to find neighbors
Research Directions (cont.) • Modeling time • There might be sequential patterns on the items a user purchased (e.g., bread machine -> bread machine mix) • Probabilistic model of preferences • Making preference function a probability function, e.g, P(A>B|U) • Clustering items and users • Minimizing preference disagreements
References • Cohen, W.W., Schapire, R.E., and Singer, Y. (1999) "Learning to Order Things", Journal of AI Research, Volume 10, pages 243-270. • Freund, Y., Iyer, R.,Schapire, R.E., & Singer, Y. (1999). An efficient boosting algorithm for combining preferences. Machine Learning Journal. 1999. • Breese, J. S., Heckerman, D., and Kadie, C. (1998). Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the 14th Conference on Uncertainty in Articial Intelligence, pp. 43-52. • Alexandrin Popescul and Lyle H. Ungar, Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments, UAI 2001. • N. Good, J.B. Schafer, J. Konstan, A. Borchers, B. Sarwar, J. Herlocker, and J. Riedl. "Combining Collaborative Filtering with Personal Agents for Better Recommendations." Proceedings AAAI-99. pp 439-446. 1999.