400 likes | 544 Views
Quest for $1,000,000: The Netflix Prize. Bob Bell AT&T Labs-Research July 15, 2009 Joint work with Chris Volinsky, AT&T Labs-Research and Yehuda Koren, Yahoo! Research. Recommender Systems. Personalized recommendations of items (e.g., movies) to users Increasingly common
E N D
Quest for $1,000,000:The Netflix Prize Bob Bell AT&T Labs-Research July 15, 2009 Joint work with Chris Volinsky, AT&T Labs-Research and Yehuda Koren, Yahoo! Research
Recommender Systems • Personalized recommendations of items (e.g., movies) to users • Increasingly common • To deal with explosive number of choices on the internet • Netflix • Amazon • Many others
Content Based Systems • A pre-specified list of attributes • Score each item on all attributes • User interest obtained for the same attributes • Direct solicitation, or • Estimated based on user purchases or ratings
Pandora • Music recommendation system • Songs rated on 400+ attributes • Music genome project • Roots, instrumentation, lyrics, vocals • Two types of user feedback • Seed songs • Thumbs up/down for recommended songs
Drawbacks of Content Based Systems • Effort to score all items on many attributes • Best attributes may be unknown • Some attributes may be unscorable • Need for direct solicitation of data from users in some systems
Collaborative Filtering (CF) • Does not require content information about items or solicitation of users • Infers user-item relationships from purchases or ratings • Used by Amazon and Netflix
“We’re quite curious, really. To the tune of one million dollars.” – Netflix Prize rules • Goal to improve on Netflix’ existing movie recommendation technology • Prize • Based on reduction in root mean squared error (RMSE) on test data • $1,000,000 grand prize for 10% drop • Or, $50,000 progress for best result each year • Contest began October 2, 2006
Data Details • Training data • 100 million ratings (from 1 to 5 stars) • 6 years (2000-2005) • 480,000 users • 17,770 “movies” • Test data • Last few ratings of each user • User, movie, date given • Ratings withheld (for most of test data) • Teams are allowed daily feedback on their RMSE
Ratings per Movie in Training Data Avg #ratings/movie: 5627
Ratings per User in Training Data Avg #ratings/user: 208
Nearest Neighbor (NN) Methods • Most common CF tool • Predict rating for a specific user-item pair based on ratings of • Similar items • By the same user • Or vice versa • Requires no “content” about items or users • Easy to apply • Easy to explain to users • But not as powerful as other methods
Latent Factor Models • Explain ratings by a set of latent factors (attributes) • Factors are learned from the data • No need for pre specification • Neural networks • SVD (Singular Value Decomposition) • AKA matrix factorization • Dominant method used by leaders of competition
Item Factors • Each item summarized by a d-dimensional vector qi • Potential factors • Comedy vs. drama • Amount of action • Depth of character development • Totally uninterpretable • Choose d much smaller than number of items or users • e.g., d = 50 << 18,000 or 480,000
User Factors • Similarly, each user summarized by pu • Same number of factors • User factors measure interest in corresponding item factors • Predicted rating for Item i by User u • Inner product of qi and pu
serious Braveheart Amadeus The Color Purple Lethal Weapon Sense and Sensibility Ocean’s 11 Geared towards females Geared towards males The Lion King Dumb and Dumber The Princess Diaries Independence Day escapist
serious Braveheart Amadeus The Color Purple Lethal Weapon Sense and Sensibility Ocean’s 11 Geared towards females Geared towards males Dave The Lion King Dumb and Dumber The Princess Diaries Independence Day Gus escapist
Challenges in Using SVD • Need lots of factors (large d)
Challenges in Using SVD • Need lots of factors (large d) • Easy to over fit
The Fundamental Challenge • How can we estimate as much signal as possible where there are sufficient data, without over fitting where data are scarce?
serious Braveheart Amadeus The Color Purple Lethal Weapon Sense and Sensibility Ocean’s 11 Geared towards females Geared towards males The Lion King Dumb and Dumber The Princess Diaries Independence Day Gus escapist
serious Braveheart Amadeus The Color Purple Lethal Weapon Sense and Sensibility Ocean’s 11 Geared towards females Geared towards males The Lion King Dumb and Dumber The Princess Diaries Independence Day Gus escapist
serious Braveheart Amadeus The Color Purple Lethal Weapon Sense and Sensibility Ocean’s 11 Geared towards females Geared towards males The Lion King Dumb and Dumber The Princess Diaries Gus Independence Day escapist
serious Braveheart Amadeus The Color Purple Lethal Weapon Sense and Sensibility Ocean’s 11 Geared towards females Geared towards males Gus The Lion King Dumb and Dumber The Princess Diaries Independence Day escapist
Challenges in Using SVD • Need lots of factors (large d) • Easy to over fit • User behavior may change over time • Ratings go up or down • Interests may change • Composition of account may change, for example, with addition of a new rater
serious Braveheart Amadeus The Color Purple Lethal Weapon Sense and Sensibility Ocean’s 11 Geared towards females Geared towards males The Lion King Dumb and Dumber The Princess Diaries Independence Day Gus escapist
serious Braveheart Amadeus The Color Purple Lethal Weapon Sense and Sensibility Ocean’s 11 Geared towards females Geared towards males The Lion King Dumb and Dumber Gus The Princess Diaries Independence Day escapist
serious Braveheart The Color Purple Amadeus Lethal Weapon Sense and Sensibility Ocean’s 11 Gus + Geared towards females Geared towards males The Lion King Dumb and Dumber The Princess Diaries Independence Day escapist
Challenges in Using SVD • Need lots of factors (large d) • Easy to over fit • User behavior may change over time • Misses some types of patterns
Neither SVD nor NN is Perfect • SVD is poorly situated to fully capture strong “local” relationships • e.g., among sequels • NN ignores cumulative effect of many small signals • May be ineffective for items with no close neighbors • Each method complements the other
The Wisdom of Crowds (of Models) • All models are wrong; some are useful – G. Box • Our best entry during Year 1 was a linear combination of 107 sets of predictions • Nearest neighbors, SVD, neural nets, et al. • Many variations of model structure and parameter settings • Years 2 and 3 • Individual models are more comprehensive and much more accurate • Combining many models still helps • Five models suffice to beat Year 1 score
Is this Any Way to do Science? • Wide participation • Submissions from 5,000 teams • 8,300 posts on the Netflix Prize forum • Generation and dissemination of new methods • Presentations/workshops in academic conferences • Journal publications • Reasons for success • Well designed by Netflix • Industrial strength data set • Opportunity to build on work of others • Collegial spirit of competitors
Thank You! • rbell@research.att.com • www.netflixprize.com • …/leaderboard • …/community • Click BellKor on Leaderboard for details