160 likes | 308 Views
Modeling User Rating Profiles For Collaborative Filtering. Benjamin M. Marlin. marlin@cs.toronto.edu. University of Toronto. Department of Computer Science. Toronto, Ontario, Canada. 2. Introduction. AP 08. 1. Abstract.
E N D
Modeling User Rating ProfilesFor Collaborative Filtering Benjamin M. Marlin marlin@cs.toronto.edu University of Toronto. Department of Computer Science.Toronto, Ontario, Canada 2. Introduction AP 08
1. Abstract • We present a new latent variable model for rating-based collaborative filtering called the User Rating Profile model (URP). URP has complete generative semantics at the user and rating profile levels. • URP is related to several models including a multinomial mixture model, the aspect model, and latent Dirichlet allocation, but has advantages over each. • A variational Expectation Maximization procedure is used to fit the URP model. Rating prediction makes use of a well defined variational inference procedure. • Empirical results on two rating prediction tasks using the EachMovie and MovieLens data sets show that URP attains lower error rates than the multinomial mixture model, the aspect model, and neighborhood-based techniques. 2. Introduction
Preference Indicators Co-occurrence Pair (u,y):u is a user index and y is an item index. Count Vector (n1u, n2u, … , nMu): nyuis the number of times (u,y) is observed. Rating Triplet (u,y,r): u is a user index, y is an item index, r is a rating value. Rating Vector (r1u, r2u, … , rMu): ryuis rating assigned to item y by user u. Additional Features In a pure formulation no additional features are used. A hybrid formulation incorporates additional content-based item and user features. Preference Dynamics In a sequential formulation the rating process is modeled as a time series. In a non-sequential formulation preferences are assumed to be static. Collaborative Filtering Formulations
Formal Description: Items: y=1,…,MUsers:u=1,…,NRatings: r=1,…,V Additional Features: NonePreference Dynamics: Non-sequentialPreference Indicators: Ordinal rating vectors Tasks: The two main tasks under this formulation are recommendation and rating prediction. Rating prediction is the task of estimating all unknown ratings for the active user. The focus of research is developing highly accurate methods for rating prediction. Item List Rating Prediction Recommendation 1. Item y2 2. Item y3 Sort Predicted Ratings The Pure, Non-Sequential, Rating-Based Formulation Rating Database Active User Ratings Figure 1: Given a rating prediction method, a recommendation method is easily obtained: predict, then sort.
3. Related Work Neighborhood Methods: • Introduced by Resnick et al (GroupLens), Shardanand and Maes (Ringo). • All variants can be seen as modifications of the K-Nearest Neighbor classifier. Rating Prediction: 1.Compute similarity measure between active user and all users in database. 2. Compute predicted rating for each item. Multinomial Mixture Model: • A simple mixture model with fast, reliable learning by EM, and low prediction time. • Simple but correct generative semantics.Each profile is generated by 1 of K types. Learning: E-Step: M-Step: Rating Prediction:
E-Step: M-Step: Latent Dirichlet Allocation: The Aspect Model: • Proposed by Blei et al. for text modeling. • Can be used in a co-occurrence based CF formulation. Can not model ratings. • A correct generative version of the dyadic aspect model. User’s distribution over types is random variable with Dirichlet prior. • Many versions proposed by Hofmann. Of main interest are dyadic, triadic, and new vector version proposed by Marlin. • All have incomplete generative semantics. Learning (Vector): Learning: • Model learned using variational EM or Minka’s Expectation propagation. • Exact inference not possible. Prediction: Rating Prediction (Vector): • Needs approximate inference. Variational methods result in an iterative algorithm.
Graphical Models: Figure 2: Dyadic Aspect Model Figure 3: Triadic Aspect Model Figure 4: Vector Aspect Model Co-occurrence to Ratings Ratings to Rating profiles Variable U: User indexVariable Z: Attitude indexVariable Y: Item IndexVariable R: Rating ValueParameter : P(Z|U=u)Parameter : P(R|Z=z,Y=y) Variable U: User indexVariable Zy: Attitude indexVariable Ry: Rating valueVariable Y: Item IndexParameter : P(Z|U=u)Parameter : P(R|Z=z,Y=y) Variable U: User indexVariable Z: Attitude indexVariable Y: Item IndexParameter : P(Z|U=u)Parameter : P(Y|Z=z) Generative Generative
Figure 5: LDA Model Figure 6: URP Model Variable : P(Z|U=u) Variable Zy: Attitude indexVariable Ry: Rating valueVariable Y: Item indexParameter : Dirichlet prior Parameter : P(Ry |Z=z) Variable : P(Z|U=u) Variable Z: Attitude indexVariable Y: Item indexParameter : Dirichlet priorParameter : P(Y|Z=z) Co-occurrence to Rating Profile
4. The URP Model Model Specification: Generative Process: • Unlike a simple mixture model, each user has a unique distribution over . • Unlike the aspect model family, there are proper generative semantics on . • Unlike LDA, URP generates a set of complete user rating profiles Description: • The latent space description of a user is a Dirichlet random variable that encodes a multinomial distribution over user types. • Each setting of the multinomial variables Zy is an index into K user types or user attitudes. • Each user attitude is represented by a multinomial distribution over ratings for each item encoded by . • The multinomial variables Ry give the ratings for each item y. Possible values are from 1 to V. 1. For each user u = 1 to N 2. Sample ~ Dirichlet()3. For each item y = 1 to M4. Sample z ~ Multinomial()5. Sample r ~ Multimonial(yz)
Learning Variational Approximation • Exact inference is intractable with URP. We define a fully factorized approximate q-distribution with variational multinomial parameters u, and variational Dirichlet parameters u. Paramter Estimation Variational Inference Solve
Rating Prediction • Once rating distributions are estimated, any number of prediction techniques can be used. The prediction technique should match the error measure used. 5. Experimentation Strong Generalization Experiment: • Users split into training set and testing set. Ratings for test users split into observed and unobserved sets. Trained on training users, tested on test users. • Repeated on 3 random splits of data. Weak Generalization Experiment: • Available ratings for each user split into observed and unobserved sets. Trained on the observed ratings, tested on the unobserved ratings. • Repeated on 3 random splits of data.
Error Measure: Data Sets: Normalized Mean Absolute Error: • Average over all users of the absolute difference between predicted and actual ratings. • Normalized by expectation of the difference between predicted and actual ratings under empirical rating distribution of the base data set. EachMovie: Compaq Systems Research Center • Ratings: 2,811,983 • Sparsity: 97.6%• Filtering: 20 ratings • Users: 72916• Items: 1628 • Rating Values: 6 MovieLens: GroupLens Research Center • Ratings: 1,000,209 • Sparsity: 95.7%• Filtering: 20 ratings • Users: 6040• Items: 3900 • Rating Values: 5 Figure 7: Distribution of ratings in weak and strong filtered data sets compared to base data sets.
5. Experimentation and Results 6. Results Norm. Norm. Figure 9: MovieLens Strong Generalization Results Figure 8: MovieLens Weak Generalization Results • URP and the aspect model attain the same minimum weak generalization error rate, but URP does so using far fewer model parameters.
Norm. Norm. Figure 11: EachMovie Strong Generalization Results Figure 10: EachMovie Weak Generalization Results • On the more difficult EachMovie data set, URP clearly performs better than the other rating prediction methods considered.
7. Conclusions and Future Work Conclusions: • We have introduced URP, a new generative model specially designed for pure, non-sequential, ratings-based collaborative filtering. URP has consistent generative semantics at both the user level, and the rating profile level. • Empirical results show that URP outperforms other popular rating prediction methods using fewer model parameters. Future Work: • Models with more intuitive generative semantics. Currently under study are a promising family of product models. • Models that integrate additional features, or sequential dynamics, or both.
8. References 1. D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993-1022, January 2003. 2. John S. Breese, David Heckerman, and Carl Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence, pages 43-52, July 1998. 3. Thomas Hofmann. Learning What People (Don't) Want. In Proceedings of the European Conference on Machine Learning (ECML), 2001. 5. Thomas Minka and John Lafferty. Expectation-Propagation for the Generative Aspect Model. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, 2002. 6. R. M. Neal and G. E. Hinton. A new view of the EM algorithm that justifies incremental, sparse and other variants. In M. I. Jordan, editor, Learning in Graphical Models, pages 355-368. Kluwer Academic Publishers, 1998. 7. P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm, and J. Riedl. GroupLens: An Open Architecture for Collaborative Filtering of Netnews. In Proceedings of ACM 1994 Conference on Computer Supported Cooperative Work, pages 175{186, Chapel Hill, North Carolina, 1994. ACM. 8. Upendra Shardanand and Patti Maes. Social information ltering: Algorithms for automating “word of mouth". In Proceedings of ACM CHI'95, volume 1, pages 210-217, 1995.