1 / 54

Lessons from the Netflix Prize

The Netflix Prize competition aimed to enhance movie recommendation technology by reducing RMSE on test data, offering a $1,000,000 grand prize. Challenges included massive data size, missing data, and varied user preferences. Techniques like Collaborative Filtering and SVD were pivotal. The quest was to balance signal estimation and avoid overfitting in sparse data. Content-Based Systems and Nearest Neighbor Methods were crucial tools. Understanding user-item interactions and latent factor models were key to success. Regularization for SVD helped optimize predictive accuracy on test data.

henriettah
Download Presentation

Lessons from the Netflix Prize

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lessons fromthe Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

  2. “We’re quite curious, really. To the tune of one million dollars.” – Netflix Prize rules • Goal to improve on Netflix’s existing movie recommendation technology • Contest began October 2, 2006 • Prize • Based on reduction in root mean squared error (RMSE) on test data • $1,000,000 grand prize for 10% drop (19% for MSE) • Or, $50,000 progress for best result each year

  3. Data Details • Training data • 100 million ratings (from 1 to 5 stars) • 6 years (2000-2005) • 480,000 users • 17,770 “movies” • Test data • Last few ratings of each user • Split as shown on next slide

  4. Test Data Split into Three Pieces • Probe • Ratings released • Allows participants to assess methods directly • Daily submissions allowed for combined Quiz/Test data • Identity of Quiz cases withheld • RMSE released for Quiz • Test RMSE withheld • Prizes based on Test RMSE

  5. Higher Mean Rating in Probe Data

  6. Something Happened in Early 2004 2004 6

  7. Data about the Movies

  8. Most Active Users 8

  9. Major Challenges Size of data Places premium on efficient algorithms Stretched memory limits of standard PCs 99% of data are missing Eliminates many standard prediction methods Certainly not missing at random Training and test data differ systematically Test ratings are later Test cases are spread uniformly across users 9

  10. Major Challenges (cont.) Countless factors may affect ratings Genre, movie/TV series/other Style of action, dialogue, plot, music et al. Director, actors Rater’s mood Large imbalance in training data Number of ratings per user or movie varies by several orders of magnitude Information to estimate individual parameters varies widely 10

  11. Ratings per Movie in Training Data Avg #ratings/movie: 5627

  12. Ratings per User in Training Data Avg #ratings/user: 208

  13. The Fundamental Challenge • How can we estimate as much signal as possible where there are sufficient data, without over fitting where data are scarce?

  14. Recommender Systems Personalized recommendations of items (e.g., movies) to users Increasingly common To deal with explosive number of choices on the internet Netflix Amazon Many others 14

  15. Content Based Systems A pre-specified list of attributes Score each item on all attributes User interest obtained for the same attributes Direct solicitation, or Estimated based on user rating, purchases, or other behavior 15

  16. Pandora Music recommendation system Songs rated on 400+ attributes Music genome project Roots, instrumentation, lyrics, vocals Two types of user feedback Seed songs Thumbs up/down for recommended songs 16

  17. Collaborative Filtering (CF) Avoids need for: Determining “proper” content Collecting information about items or users Infers user-item relationships from purchases or ratings Used by Amazon and Netflix Two main CF tools Nearest neighbors Latent factor models 17

  18. Nearest Neighbor Methods • Most common CF tool at the beginning of the contest • Predict rating for a specific user-item pair based on ratings of • Similar items • By the same user • Or vice versa • Pearson correlation or cosine similarity

  19. Merits of Nearest Neighbors • Few modeling assumptions • Few tuning parameters to learn • Easy to explain to users • Dear Amazon.com Customer, We've noticed that customers who have purchased or rated How Does the Show Go On: An Introduction to the Theater by Thomas Schumacher have also purchased Princess Protection Program #1: A Royal Makeover (Disney Early Readers).

  20. Latent Factor Models • Models with latent classes of items and users • Individual items and users are assigned to either a single class or a mixture of classes • Neural networks • Restricted Boltzmann machines • Singular Value Decomposition (SVD) • AKA matrix factorization • Items and users described by unobserved factors • Main method used by leaders of competition

  21. SVD • Dimension reduction technique for matrices • Each item summarized by a d-dimensional vector qi • Similarly, each user summarized by pu • Choose d much smaller than number of items or users • e.g., d = 50 << 18,000 or 480,000 • Predicted rating for Item i by User u • Inner product of qi and pu

  22. serious Braveheart Amadeus The Color Purple Lethal Weapon Sense and Sensibility Ocean’s 11 Geared towards females Geared towards males The Lion King Dumb and Dumber The Princess Diaries Independence Day escapist

  23. serious Braveheart Amadeus The Color Purple Lethal Weapon Sense and Sensibility Ocean’s 11 Geared towards females Geared towards males Dave The Lion King Dumb and Dumber The Princess Diaries Independence Day Gus escapist

  24. Regularization for SVD • Want to minimize SSE for Test data • One idea: Minimize SSE for Training data • Want large d to capture all the signals • But, Test RMSE begins to rise for d > 2 • Regularization is needed • Allow rich model where there are sufficient data • Shrink aggressively where data are scarce • Minimize

  25. serious Braveheart Amadeus The Color Purple Lethal Weapon Sense and Sensibility Ocean’s 11 Geared towards females Geared towards males The Lion King Dumb and Dumber The Princess Diaries Independence Day Gus escapist 25

  26. serious Braveheart Amadeus The Color Purple Lethal Weapon Sense and Sensibility Ocean’s 11 Geared towards females Geared towards males The Lion King Dumb and Dumber The Princess Diaries Independence Day Gus escapist 26

  27. serious Braveheart Amadeus The Color Purple Lethal Weapon Sense and Sensibility Ocean’s 11 Geared towards females Geared towards males The Lion King Dumb and Dumber The Princess Diaries Gus Independence Day escapist 27

  28. serious Braveheart Amadeus The Color Purple Lethal Weapon Sense and Sensibility Ocean’s 11 Geared towards females Geared towards males Gus The Lion King Dumb and Dumber The Princess Diaries Independence Day escapist 28

  29. Estimation for SVD Fit by gradient descent Loop over observed ratings Update each relevant parameter Small step in each parameter, proportional to gradient Repeat until convergence Alternatively, fit by sequence of ridge regressions Fix item factors Loop over users, estimating user factors Do same to estimate item factors Repeat until convergence 29

  30. Improvements toCollaborative Filtering • Fine tune existing methods • Incorporate alternative “effects” • Incorporate a variety of modeling methods • Careful regularization to avoid over fitting

  31. Localized SVD • SVD uses all of a user’s ratings to train the user’s factors • But what if the user is multiple people? • Different factor values may apply to movies rated by Mom vs. Dad vs. the Kids • This approach computes user factors, pu , specific to the movie being predicted • Given all the {qi}, pu is the solution of a ridge regression • Weighted ridge regressions with higher weights for movies similar to the target movie

  32. Improvement from Localized SVD

  33. Lesson #1: Data >> Models Very limited feature set User, movie, date Places focus on models/algorithms Major steps forward associated with incorporating new data features What movies a user rated Temporal effects 33

  34. You are What You Rate • What you rate (and don’t) provides information about your preferences • Paterek’s NSVD explicitly characterizes users by which movies they like • Incorporate what a user rated into the user factor • Substantially reduces RMSE

  35. Temporal Effects • User behavior may change over time • Ratings go up or down • Interests change • For example, with addition of a new rater • Allow user biases and/or factors to change over time • Model au(t) and pu(t) as linear, unrestricted, or a sum of both types

  36. serious Braveheart Amadeus The Color Purple Lethal Weapon Sense and Sensibility Ocean’s 11 Geared towards females Geared towards males The Lion King Dumb and Dumber The Princess Diaries Independence Day Gus escapist 36

  37. serious Braveheart Amadeus The Color Purple Lethal Weapon Sense and Sensibility Ocean’s 11 Geared towards females Geared towards males The Lion King Dumb and Dumber The Princess Diaries Independence Day Gus escapist 37

  38. serious Braveheart Amadeus The Color Purple Lethal Weapon Sense and Sensibility Ocean’s 11 Geared towards females Geared towards males The Lion King Dumb and Dumber Gus The Princess Diaries Independence Day escapist 38

  39. serious Braveheart The Color Purple Amadeus Lethal Weapon Sense and Sensibility Ocean’s 11 Gus + Geared towards females Geared towards males The Lion King Dumb and Dumber The Princess Diaries Independence Day escapist 39

  40. #2: The Power of Regularized SVD Fit by Gradient Descent • Allowed anyone to approach early leaders • Powerful predictor • Efficient • Easy to program • Flexibility to incorporate additional features • Implicit feedback • Temporal effects • Neighborhood effects • Accurate regularization is essential

  41. #3: The Wisdom of Crowds (of Models) • All models are wrong; some are useful – G. Box • Used linear blends of many prediction sets • 107 in Year 1 • Over 800 at the end • Difficult, or impossible, to build the grand unified model • Mega blends are not needed in practice • A handful of simple models achieves 80 percent of the improvement of the full blend

  42. #4: Find Good Teammates Yehuda Koren The engine of progress for the Netflix Prize Implicit feedback Temporal effects Nearest neighbor modeling Big Chaos: Michael Jahrer, Andreas Toscher (Year 2) Optimization of tuning parameters Blending methods Pragmatic Theory: Martin Chabbert, Martin Piotte (Year 3) Some movies age better than others Link functions 43

  43. The Final Leaderboard 44

  44. Test Set Results The Ensemble: 0.856714 45

  45. Test Set Results The Ensemble: 0.856714 BellKor’s Pragmatic Theory: 0.856704 46

  46. Test Set Results The Ensemble: 0.856714 BellKor’s Pragmatic Theory: 0.856704 Both scores round to 0.8567 47

  47. Test Set Results The Ensemble: 0.856714 BellKor’s Pragmatic Theory: 0.856704 Both scores round to 0.8567 Tie breaker is submission date/time 48

  48. Final Test Set Leaderboard 49

  49. Who Got the Money? • AT&T’s donated its full share to organizations supporting science education • Young Science Achievers Program • New Jersey Institute of Technology pre-college and educational opportunity programs • North Jersey Regional Science Fair • Neighborhoods Focused on African American Youth

More Related