380 likes | 485 Views
Combining Predictions for Accurate Recommender Systems. M. Jahrer 1 , A. Töscher 1 , R. Legenstein 2 1 Commendo Research & Consulting 2 Institute for Theoretical Computer Science, Graz University of Technology KDD ‘10 2010. 11. 26.
E N D
Combining Predictions for Accurate Recommender Systems M. Jahrer1, A. Töscher1, R. Legenstein2 1Commendo Research & Consulting 2Institute for Theoretical Computer Science, Graz University of Technology KDD ‘10 2010. 11. 26. Summarized and Presented by Sang-il Song, IDS Lab., Seoul National University
Contents • The Netflix Prize • Neflix Dataset • Challenge of Recommendation • Review: Collaborative Techniques • Motivation • Blending Techniques • Linear Regression • Binned Linear Regression • Neural Network • Bagged Gradient Boosted Decision Tree • Kernel Ridge Regression • K-Nearest Neighbor Blending • Results • Conclusion
The Netflix Prize Open competition for the best collaborative filtering algorithm The objective is to improve the performance of Netflix’s own recommendation algorithm by 10%
Netflix Dataset 480,189 users 17,770 movies 100,480,507 ratings (training data) Each rating is formed as <user, movie, date of grade, grade>
Measure of CF algorithm error • Root Mean Square Error (RMSE) • is estimated rating by algorithm • N is size of test dataset • The original Netflix Algorithm, called “Cinematch”, achieved an RMSE of about 0.95
Challenges of Recommender System Reference R. Bell – Lesson From the Netflix Prize • Size of Data • Places premium on efficient algorithms • Stretched memory limits of standard PCs • 99% of data are missing • Eliminates many standard prediction methods • Certainly not missing at random • Countless Factors may affect ratings • Large imbalance in training data • Number of ratings per user or movie varies by several orders of magnitude • Information to estimate individual parameters varies widely
Collaborative Filtering Techniques • Memory based Approach • KNN user-user • KNN item-item • Model based Approach • Singular Value Decomposition (SVD) • Asymmetric Factor Model (AFM) • Restricted Boltzmann Machine (RBM) • Global Effect (GE) • Combination: Residual Training
KNN user-user • Traditional Approach for Collaborative Filtering • Methods • Find k similar users with user u • Aggregate their ratings for item i
KNN item-item • Symmetric Approach to KNN user-user • Just flip user and item sides • Methods • Find k similar items with item i • Aggregate their ratings for user u
SVD (matrix factorization) Singular Value Decomposition Dimension Reduction Technique by Matrix Factorization Capturing Latent Semantics
SVD Example is factorized into R = x x
Asymmetric Factorization Model user Item 1 Item 2 Item 3 An Extension of SVD Item is represented by feature vector (same as SVD) User is represented by items (different from SVD)
Restricted Boltzmann Machine (RBM) Neural Network with one input layer and one hidden layer Handling sparsity problem of data very well
Global Effects • Motivated from Data normalization • Based on user and item features • support (number of votes) • mean rating • mean standard deviation • Effective when applied to residuals of other algorithms
Residual Training Model 1 Model 2 Model 3 A popular method to combine CF algorithms Several models are trained by sequentially
Motivation • Combinations of different kinds of collaborative filtering • leads to significant performance improvements over individual algorithms
Rookies “Thanks to Paul Harrison's collaboration, a simple mix of our solutions improved our result from 6.31 to 6.75”
Arek Paterek “My approach is to combine the results of many methods (also two-way interactions between them) using linear regression on the test set. The best method in my ensemble is regularized SVD with biases, post processed with kernel ridge regression” http://rainbow.mimuw.edu.pl/~ap/ap_kdd.pdf
U of Toronto “When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflix’s own system.” http://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf
Gravity home.mit.bme.hu/~gtakacs/download/gravity.pdf
When Gravity and Dinosaurs Unite “Our common team blends the result of team Gravity and team Dinosaur Planet.” Might have guessed from the name…
BellKor / KorBell And, yes, the top team which is from AT&T… “Our final solution (RMSE=0.8712) consists of blending 107 individual results. “
Blending Methods Linear Regression (baseline) Binned Linear Regression Neural Network Bagged Gradient Boosted Decision Tree Kernel Ridge Regression K-Nearest Neighbor Blending
Linear Regression • Baseline • Assume a quadratic error function • Find optimal linear combination weight w • By solving the least squares problem • Weight w can be calculated with ridge regression
Binned Linear Regression • A Simple Extension of Linear Regression • Training dataset can be divided into B disjoint subjects • Training dataset may be very huge • Each subset can be used to learn different weight wb • Training set can be split by using following criteria: • Support (number of votes) • Time • Frequency (number of ratings from a user at day t).
Neural Network (NN) Rating Alg 1 Alg 2 Alg 3 Alg 4 Efficient for huge data sets
Bagged Gradient Boosted Decision Tree (BGBDT) • Single Decision Tree • Discretized output => limits its ability to model smooth functions • The number of possible outputs corresponds to the number of leaves • A Single tree is trained recursively by splitting always that leaf which provides the output value for the largest number of training samples • Bagging • Training Nbag copies of the model slightly different training set • (Stochastic Gradient) Boosting • Each model learns only a fraction of the desired function Ω
Kernel Ridge Regression Blending (KRR) • Kernel Ridge Regression • Regularized least square method for classification and regression • Similar to an SVM • But, emphasis on points which don’t close to the decision boundary • Suitable for a small number of features and many training data sets. • Training complexity: O(n3) • Space requirement s: O(n2)
K-Nearest Neighbor Blending (KNN) Find k Similar Training Data Samples <user,item> Aggregate the target value
Experimental Setup • 18 CF Algorithms • 4 versions of AFM • 4 versions of GE • 4 versions of KNN-item • 2 versions of RBM • 4 versions of SVD • 1,400,000 samples • Running at 3.8 GHz CPU with 12GB main memory
Conclusions The combinations of Collaborative Filtering Algorithms outperforms the single collaborative filtering algorithms