220 likes | 390 Views
Top-N Recommendation Algorithm Based on Item-Graph. Allen, Zhenjiang LIN CSE, CUHK June 7, 2007. Outline. 1. Top-N Recommendation Problem 2. Top-N Recommendation Algorithm 3. Item-Graph Model and GCP-based Method Item-Graph Model
E N D
Top-N Recommendation Algorithm Based on Item-Graph Allen, Zhenjiang LIN CSE, CUHK June 7, 2007
Outline • 1. Top-N Recommendation Problem • 2. Top-N Recommendation Algorithm • 3. Item-Graph Model and GCP-based Method • Item-Graph Model • Generalized Conditional Probability(GCP)-based Recommendation Algorithm • 4. Preliminary Experimental Results • 5. Conclusion and Future Work
Active User Basket 1. Top-N Recommendation Problem • The Top-N Recommendation Problem • Given the preference information of users, recommend a set of N items to a certain user that he might be interested in, based on the items he has selected. • E-commerce system example: Amazon. COM, customers vs products. User-Item matrix
Active User Basket Recommendations Example: the Amazon.com
1. Top-N Recommendation Problem • Challenges in E-commerce Systems • Huge amounts of data: millions of users and/or items; • Real-time return the results set; • Limited new user’s preference information; • Volatile users’ preference information. • Contributions • Propose the Item-Graph model. • simple & incremental • to reflect the relationship among items • Develop the Generalized Conditional Probability-based top-N recommendation algorithm. • item-centric • based-on the Item-Graph model
2. Top-N Recommendation Algorithm • Two main paradigms • Content-based: recommend items based on the content (textual information) of items. • Fab system [Balabanovic97], Syskill & Webert system [Pazzani97]. • Collaborative Filtering (CF): recommend items by collecting taste information from other users. • Collaborative between users (link information). • More popular than content-based recommendation, since in many domains (such as music, restaurants) it is hard to extract useful features from items. • Tapestry system [Goldberg92], Video Recommender [Hill95], Ringo [Shardanand95], GroupLens [Konstan97], Jester system [Goldberg01],Amazon [Linden03].
2. Top-N Recommendation Algorithm • CF algorithms classified by strategy of using data • Memory-based:make recommendations based on the entire collection of references of the users. • No pre-computing is needed, suffer serious scalability problem. • E.g., Correlation-based [Resnick94], Cosine-based [Breese98]. • Model-based:use the collection of user preferences to learn a model, which is then used to make recommendations. • Building a model off-line, more scalable. • E.g., Cluster models [Ungar98], Bayesian network model [Breese98], Association Rule Mining approach [Lin00].
2. Top-N Recommendation Algorithm • CF algorithms classified by strategy of using objects • User-centric: look for similar (like-minded) users first and then make recommendation. • Similarity between users is relatively dynamic. • Pre-computing user neighborhood may lead to poor predictions. • Item-centric: look for similar (or related) items first and then make recommendation. • Similarity between items is relatively static. • Enables pre-computing of item-item similarity. • Therefore, more scalable. • The aim of our work • Model-based Item-centric CF top-N recommendation algorithm.
2. Top-N Recommendation Algorithm • Notations • Item set I = {I1, I2, …, Im}. • User set U = {U1, U2, …, Un}. • User-Item matrix D = (Dn,m). • Basket of the active user B I. • Similarity score of x and y: sim(x,y). • Formal definition of top-N recommendation problem • Given a user-item matrix D and a set of items B that have been purchased by the active user, identify an ordered set of items X such that |X| ≤ N, and X ∩B = 0.
2. Top-N Recommendation Algorithm • Two classical item-item similarity measures • Cosine-based (symmetric) sim(Ii, Ij) = cos(D*,i, D*,j) (1) • Conditional Probability(CP)-based (asymmetric) sim(Ii, Ij) = P(Ij | Ii) ≈Freq(Ii Ij) / Freq(Ii) (2) Freq(X): the number of customers who have purchased the item set X. • The ranking score for item x RS(x) = ∑ b∈B sim(b,x) (3)
1 2 a b c 3. Item-Graph Model & GCP-based Method • Intuitions behind the Item-Graph • The similarity between two items is proportional to the times of co-purchase of them. • The similarity of item-pairs is transmissible. • E.g., • Definition of the Item-Graph • Given a dataset D = (Dn,m), the Item-Graph is defined by a weighted & undirected graph G(V, E, W), where • V is the item set I. • An edge (x, y)∈E if and only if items x and y have been co-purchased. • The weight of edge (x, y) is defined by the number of co-purchase of items x and y.
1 2 2 3 a a b b c c (a,b,c) 1 3. Item-Graph Model & GCP-based Method • Updating the Item-Graph is easy • Adding new user’s preference information T into the graph needs O(|T|2) operations, including adding edges and/or increasing weight of edges. • E.g., • Potentially direct application of the Item-Graph • Clustering the items. • Measuring item-item similarity. • Measuring importance of items.
3. Item-Graph Model & GCP-based Method • Ideas in Generalized Conditional Probability-based method • According to the definition of top-N recommendation problem, for any x in I-B, we just need to compute the “basket-based” conditional probability P(x|B) = Freq(xB) / Freq(B). However, • Freq(xB) or Freq(B) may not exist, or • Freq(xB) or Freq(B) are too small to make much sense. • The CP-based method considers the sum of “1-item”-based conditional probabilities P(x|y) instead, where x∈I-B, y∈B. • However, the “multi-item”-based conditional probabilities may also contribute to the recommendation. • E.g., suppose the ranking scores of x and y computed by the CP-based method are equal, and we also know P(x|B)>P(y|B). Which one should be ranked higher, x or y?
3. Item-Graph Model & GCP-based Method • The Generalized Conditional Probability (GCP)-based recommendation algorithm • The ranking score of item x is defined by the sum of all possible “multi-item”-based conditional probabilities, that is, GCP(x|B) = ∑ S B P(x|S) ≈∑ S B (Freq(xS) / Freq(S)). (4) • However, the number of subsets of B is 2|B|. • Use GCPd(x|B) instead (set d=2 in the following experiments) GCPd(x|B) = ∑ S B, |S|≤ d P(x|S). (5) • Freq(xS) and Freq(S) can be extracted from the Item-Graph approximately.
2 3 a b c 1 3. Item-Graph Model & GCP-based Method • Extracting Freq(A) from Item-Graph approximately • For an item set A, obtaining the exact Freq(A) may not be possible from the Item-Graph. • Extracting approximate Freq(A) from the Item-Graph instead. • Find out the complete sub-graph of A (denoted by CSG(A)) in the Item-Graph, running time O(|A|2). • Freq(A) ≈ minimal weight of edges in CSG(A). • E.g., • for A = {a,b}, Freq(A) ≈ 3. • for B = {a,b,c}, Freq(B) ≈ 1. • P(c|ab) ≈ Freq(abc) / Freq(ab) ≈ 1 / 3.
4. Preliminary Experimental Results • Dataset • The MovieLens(http://www.grouplens.org/data) • A web-based movies recommender system; • Contains multi-valued ratings that indicate how much each user liked a particular movie or not; • Each user has rated at least 20 movies. • We treat the ratings as an indication that the users have seen the movies (nonzero) or not (zero). Table 1: The characteristics of the MovieLens dataset 1Density: the percentage of nonzero entries in the user-item matrix.
4. Preliminary Experimental Results-1 • Evaluation Design • Split the dataset into a training and test set by • randomly selecting one rated movie of each user to be part of the test set, • use the remaining rated movies for training. • Cosine(COS)-based, CP-based, GCP-based methods, 10-runs average. • Evaluation Metrics • Hit-Rate (HR) HR = # of hits / n (6) • Average Reciprocal Hit-Rate (ARHR) ARHR = (∑i=1,h1/pi) / n (7) # of hits: the number of items in the test set that were also in the top-N lists. h is the number of hits that occurred at positions p1, p2, … , ph within the top-N lists (i.e., 1 ≤ pi ≤ N).
4. Preliminary Experimental Results-1 • Performance of Top-N Recommendation Algorithms HR (left):x-axis: top-N items, y-axis: hit-rate of all users. ARHR (right):x-axis: top-N items, y-axis: average reciprocal hit-rate of all users. (For the GCP-based method, set d = 2.)
4. Preliminary Experimental Results-2 • Testing the Parameter d in GCP Method • Testing the effect of d ( d = 1, 2, 3 ). • Evaluation: Online Shopping Simulation • Randomly selecting part of the user records to be the training set; • Use the remaining user records for training. • STEP 0: Constructing the item-graph based on the training set; • STEP 1: for each user in the training set • randomly moving one item out of the user’s basket and make recommendation based on the remaining items in the basket; • computing the order of this item in the recommendation list; • updating the item-graph. • STEP 2: Computing HR and ARHR metrics.
4. Preliminary Experimental Results-2 • Performance of Top-N Recommendation Algorithms HR (left):x-axis: top-N items, y-axis: hit-rate of all users. ARHR (right):x-axis: top-N items, y-axis: average reciprocal hit-rate of all users.
5. Conclusion and Future Work • Conclusion • Top-N Recommendation Problem and item-centric Algorithms • Cosine-based, conditional probability-based • Item-Graph model • Visualizing the relationship among items. • Easy to update. • Generalized Conditional Probability-based top-N recommendation algorithm • Item-centric & based on the Item-Graph model • Future Work • Clustering items and measuring item-item similarities based on the Item-Graph model • Speeding up the GCP method.
References • [Balabanovic97] M. Balabanovic and Y. Shoham. Fab: Content-based, Collaborative Recommendation.Commun. ACM, 40(3):66-72, 1997. • [Breese98] J. S.Breese, D. Heckerman, David and C. Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98),pages 43-52, San Francisco, 1998. • [Deshpande04]M. Deshpande and G. Karypis. Item-based Top-N Recommendation Algorithms.ACM Trans. Inf. Syst., 22(1):143-177, 2004. • [Lin00] W. Lin. Association Rule Mining for Collaborative Recommender Systems. Thesis submitted for theDegree of M.S. inComputer Science. • [Linden03] G. Linden, B. Smith and J. York.Amazon.com Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Computing, 7(1):76-80, 2003. • [Resnick94] P.Resnick, N. Iacovou, M. Suchak, P. Bergstorm and J. Riedl. GroupLens: An Open Architecture for Collaborative Filtering of Netnews.Proc. Computer Supported Cooperative Work Conf., pages 175-186, 1994.