350 likes | 660 Views
Netflix Prize Solution: A Matrix Factorization Approach. By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth. Agenda. Problem Description Netflix Data Why is it a tough nut to crack? Overview of methods already applied to this problem
E N D
Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth
Agenda • Problem Description • Netflix Data • Why is it a tough nut to crack? • Overview of methods already applied to this problem • Overview of the Paper • Details of the method • How does this method works for the Netflix problem • My implementation • Results • Q and A?
Netflix Prize Problem • Given a set of users with their previous ratings for a set of movies, can we predict the rating they will assign to a movie they have not previously rated? • Defined at http://www.netflixprize.com//index • Seeks to improve the Cinematch’s (Netflix’s existing movie recommender system) prediction performance by 10%. • How is the performance measured? • Root Mean Square Error (RMSE) • Winner gets a prize of 1 Million USD.
Problem Description • Recommender Systems • Use the knowledge about preference of a group of users about a certain items and help predict the interest level for other users from same community. [1] • Collaborative filtering • Widely used method for recommender systems • Tries to find traits of shared interest among users in a group to help predict the likes and dislikes of the other users within the group. [1]
Why is this problem interesting? • Used by almost every recommender system today • Amazon • Yahoo • Google • Netflix • …
Netflix Data • Netflix released data for this competition • Contains nearly 100 Million ratings • Number of users (Anonymous) = 480,189 • Number of movies rated by them = 17,770 • Training Data is provided per movie • To verify the model developed without submitting the predictions to Netflix “probe.txt” is provided • To submit the predictions for competition “qualifying.txt” is used
Netflix Data in Pictures • These pictures are taken as is from [5]
Netflix Data • Data in the training file is per movie • It looks like this Movie# Customer#,Rating,Date of Rating Customer#,Rating,Date of Rating Customer#,Rating,Date of Rating • Example 4: 1065039,3,2005-09-06 1544320,1,2004-06-28 410199,5,2004-10-16
Netflix Data Data points in the “probe.txt” looks like this (Have answers) Data in the qualifying.txt looks like this (No answers) Movie# Customer#, DateofRating Customer#, DateofRating 1: 1046323,2005-12-19 1080030,2005-12-23 1830096,2005-03-14 Movie# Customer# Customer# 1: 30878 2647871 1283744
Hard Nut to Crack? • Why is this problem such a difficult one? • Total ratings possible = 480,189 (user) * 17,770 (movies) = 8532958530 (8.5 Billion) • Total available = 100 Million • The User x Movies matrix has 8.4 Billionentries missing • Consider the problem as Least Square problem • We can consider this problem by representing it as system of equation in a matrix
Technically tough as well • Huge memory requirements • High time requirements • Because we are using only ~100 Million of possible 8.5 Billion ratings the predictors have some error in their weights (small training data)
Various Methods Employed for Netflix Prize Problem • Nearest Neighbor methods • k-NN with variations • Matrix factorization • Probabilistic Latent Semantic Analysis • Probabilistic Matrix Factorization • Expectation Maximization for Matrix Factorization • Singular Value Decomposition • Regularized Matrix Factorization [2]
The Paper • Title: “Improving regularized singular value decomposition for collaborative filtering” - ArkadiuszPaterek, Proceedings of KDD Cup and Workshop, 2007. [3] • Uses Algorithm described by Simon Funk (Brandyn Webb) in [4]. • The algorithm revolves around regularized Singular Value Decomposition (SVD) described in [4] and suggests some interesting use of biases to it to improve performance. • It also proposes some methods for post processing of the features extracted from the SVD. • It compares the various combinations of methods suggested in the paper for the Netflix Data.
Singular Value Decomposition • Consider the given problem as a Matrix of Users x Movies A or • Movies x Users • Show are the two examples • What do we do with this representation?
Singular Value Decomposition • Method of Matrix Factorization • Applicable to rectangular matrices and square alike • Decomposes the matrix in to 3 component matrices whose product approximates the original matrix • E.g. • D $d [1] 13.218989 4.887761 1.538870 • U $u [,1] [,2] [,3] [1,] -0.5606779 0.8192382 -0.1203705 [2,] -0.5529369 -0.4786352 -0.6820331 [3,] -0.6163612 -0.3158436 0.7213472 • V $v [,1] [,2] [,3] [1,] -0.17808307 0.20598164 0.78106201 [2,] -0.16965834 0.67044040 -0.31288023 [3,] -0.52406769 0.28579770 0.15429276 [4,] -0.65435261 0.02532797 -0.26336364 [5,] -0.04182898 -0.09792523 -0.44320373 [6,] -0.48469427 -0.64511243 0.04951659
Can we recover original Matrix? • Yes. (Well almost!) Here is how. • We will Multiply the 3 Matrices U*D*VT • We get – A* ~= A. • [,1] [,2] [,3] [,4] [,5] [,6] [1,] 2.000000e+00 4.000000e+00 5 5 -1.557185e-17 1 [2,] -8.564655e-16 -1.221706e-15 3 5 1.000000e+00 5 [3,] 2.000000e+00 -1.231356e-15 4 5 1.757492e-16 5 • We can see this is an Approximation of the original matrix.
How do we use SVD? • We use the 2 matrices U and V to estimate the original matrix A. • So what happened to the diagonal matrix D? • We train our method on the given training set and learn by rolling the diagonal matrix in the two matrices. • We do U * VT and obtain A’. • Error = ∀i∀jAij’ – Aij.
Algorithm variations covered in this paper • Simple Predictors • Regularized SVD • Improved Regularized SVD (with Biases) • Post processing SVD with KNN • Post processing SVD with kernel ridge regression • K-means • Linear model for each item • Decreasing the number of Parameters
The SVD Algorithm from paper [3,4,6] • Initialize 2 arrays movieFeatures (U) and customerFeatures (V) to very small value 0.1 • For every feature# in features Until minimum iterations are done or RMSE is not improving more than minimum improvement For every data point in training set //data point has custID and movieID prating = customerFeatures[feature#][custID] * movieFeatures [feature#][movieID] //Predict the rating error = originalrating - prating //Find the error squareerrsum += error * error //Sum the squared error for RMSE. cf = customerFeatures[feature#][custID] //locally copy current feature value mf = movieFeatures [feature#][movieID] //locally copy current feature value Contd.
Algorithm contd. customerFeatures[feature#][custID] += learningrate *(error * mf – regularizationfactor * cf) //Rolling the ERROR in to the features movieFeatures [feature#][movieID] += learningrate *(error * cf – regularizationfactor * mf) //Rolling the ERROR in to the feature RMSE = (squareerrsum / total number of data points) // Calculate RMSE • Now we do the testing • For every test point with custID and movieID For every feature# in Features predictedrating += customerFeatures[feature#][custID] * movieFeatures [feature#][movieID] • Caveats – clip the ratings in the range (1, 5) predicted rating might go out of bounds • “Regularization factor” is introduced by Brandyn Webb in [4] to reduce the over fitting
Variation: Improved Regularized SVD • That was regularized SVD • Improved Regularized SVD with Biases • Predict the rating with 2 added biases Ci per customer and Dj per movie • Rating = Ci + Dj + coustomerFeatures[featue#][i] * movieFeatures[Feature#][j] • During training update the biases as • Ci += learningrate * (err – regularization(Ci + Dj – global_mean)) • Dj += learningrate * (err – regularization(Ci + Dj – global_mean)) • Learningrate = .001, regularization = 0.05, global_mean = 3.6033
Variation: KNN for Movies • Post processing with KNN • On the Regularized SVD movieFeature matrix we run cosine similarity between 2 vectors similarity = movieFeature[movieID1]T * movieFeature[movieID2] ||movieFeature[movieID1]||*||movieFeature[movieID2]|| • Using this similarity measure we build a neighborhood of 1 nearest movies and predict rating of the nearest movie as the predicted rating
Experimentation Strategy by author • Select 1.5% - 15% of the probe.txt as hold-out set or test set. • Train all models on rest of the ratings • All models predict the ratings • Merge the results using linear regression on the test set • Combining two methods for initial prediction & then performing linear regression
Results from the Paper[2] Replicated from the paper as is
My Experiments • I am trying out the regularized SVD method and Improved Regularized SVD method with qualifying.txt, probe.txt • Also, going to implement first 3 steps of the author’s experimentation strategy (in my case I will predict with regularized SVD and Improved regularized SVD) • If time permits might try SVD KNN method • I am also varying some parameters like learning rate, number of features, etc. to see its effect on the results. • I shall have all my results posted on the web site soon
References • Herlocker, J, Konstan, J., Terveen, L., and Riedl, J. Evaluating Collaborative Filtering Recommender Systems. ACM Transactions on Information Systems22 (2004), ACM Press, 5-53. • GáborTakács, IstvánPilászy, BottyánNémeth, DomonkosTikk Scalable Collaborative Filtering Approaches for Large Recommender Systems. JMLR Volume 10 :623--656, 2009. • ArkadiuszPaterek, Improving regularized singular value decomposition for collaborative filtering - Proceedings of KDD Cup and Workshop, 2007. • http://sifter.org/~simon/journal/20061211.html • http://www.igvita.com/2006/10/29/dissecting-the-netflix-dataset/ • G. Gorrell and B. Webb. Generalized hebbian algorithm for incremental latent semantic analysis. Proceedings of Interspeech, 2006.
Thanks for your time! Atul S. Kulkarni kulka053@d.umn.edu