410 likes | 649 Views
Intro to RecSys and CCF. Brian Ackerman. Roadmap. Introduction to Recommender Systems & Collaborative Filtering Collaborative Competitive Filtering. Introduction to Recommender Systems & Collaborative Filtering. Motivation.
E N D
Intro to RecSys and CCF Brian Ackerman
Roadmap • Introduction to Recommender Systems & Collaborative Filtering • Collaborative Competitive Filtering
Introduction to Recommender Systems & Collaborative Filtering
Motivation • Netflix has over 20,000 movies, but you may only be interested in a small number of these movies • Recommender systems can provide personalized suggestions based on a large set of items such as movies • Can be done in a variety of ways, the most popular is collaborative filtering
Collaborative Filtering • If two users rate a subset of items similarly, then they might rate other items similarly as well
Roadmap (RS-CF) • Motivation • Problem • Main CF Types • Memory-based – User-based • Model-based – Regularized SVD
Problem Setting • Set of users, U • Set of items, I • Users can rate items where ruiis user u’s rating on item i • Ratings are often stored in a rating matrix • R|U|×|I|
Sample Rating Matrix # is a user rating, - means a null entry, not rated
Problem • Input • Rating matrix(R|U|×|I|) • Active user, a (user interacting with the system) • Output • Prediction for all null entries of the active user
Roadmap (RS-CF) • Motivation • Problem • Main CF Types • Memory-based – User-based • Model-based – Regularized SVD
Main Types • Memory-based • User-based* [Resnick et al. 1994] • Item-based [Sarwar et al. 2001] • Similarity Fusion (User/Item-based) [Wang et al. 2006] • Model-based • SVD (Singular Value Decomposition) [Sarwar et al. 2000] • RSVD (Regularized SVD)* [Funk 2006]
User-based • Find similar user’s • KNN or threshold • Make prediction
User-based – Similar Users • Consider each user (row) to be a vector • Compare each vector to find the similarity between two users • Let a be the vector for active user and u3 be the vector for user 3 • Cosine similarity can be used to compare vectors
User-based – Similar Users • KNN (k-nearest neighbors or top-k) • Only find the k most similar users • Threshold • Find all users that are at most θ level of similarity
User-based – Make Prediction • Weighted by similarity • Weight each similar user’s rating based on similarity to active user Similar users Prediction for active user on item i
Main Types • Memory-based • User-based* [Resnick et al. 1994] • Item-based [Sarwar et al. 2001] • Similarity Fusion (User/Item-based) [Wang et al. 2006] • Model-based • SVD (Singular Value Decomposition) [Sarwar et al. 2000] • RSVD (Regularized SVD)* [Funk 2006]
Regularized SVD • Netflix data has 8.5 billion entries based on 17 thousand movie and .5 million users • Only 100 million ratings • 1.1% of all possible ratings • Why do we need to operate on such a large matrix?
Regularized SVD – Setup • Let each user and item be represented by a feature vector of length k • E.g. Item A may be vector Ak = [a1 a2 a3 … ak] • Imagine the features for items were fixed • E.g. items are movies and each feature is a genre such as comedy, drama, etc… • Features of the user vector are how well a user likes that feature
Regularized SVD – Setup • Consider the movie Die Hard • Its feature vector may be i = [1 0 0] if the features are action, comedy, and drama • Maybe the user has the feature vector u = [3.87 2.64 1.32] • We can try to predict a user’s rating using the dot product of these two vectors • r’ui= u ∙i = [1 0 0] ∙ [3.87 2.64 1.32] = 3.87
Regularized SVD – Goal • Try to find values for each item vector that work for all users • Try to find value for each user vector that can produce the actual rating when taking the dot product with the item vector • Minimizing the difference between the actual and predicted (based on dot product) rating
Regularized SVD – Setup • In reality, we cannot choose k to be large enough for a fixed number of features • There are too many to consider (e.g. genre, actors, directors, etc…) • Usually k is only 25 to 50 which reduces the total size of the matrices to only roughly 25 million to 50 million (compared to 8.5 billion) • Because of the size of k, the values in the vectors are NOT directly tied to any feature
Regularized SVD – Goal • Let u be a user, ibe an item, ruiis a rating by user u on item iwhere R is the set of all ratings, and φu, φi are the vectors • At first thought, it seems simple to have the following optimization goal
Regularized SVD – Overfitting • Problem is overfitting of the features • Solved by regularization
Regularized SVD – Regularization • Introduce a new optimization goal including a term for regularization • Minimizing the magnitude of the feature vectors • Controlled by fixed parameters λu andλi
Regularized SVD • Many improvements have been proposed to improve the regularized optimization goal • RSVD2/NSVD1/NSVD2 [Paterek 2007]: added term for user bias and a term for item bias, minimize number of parameters • Integrated Neighborhood SVD++ [Koren 2008]: used a neighborhood-based approach to RSVD
Roadmap • Introduction to Recommender Systems & Collaborative Filtering • Collaborative Competitive Filtering
Collaborative Competitive Filtering: Learning Recommender Using Context of User Choice Georgia Tech and Yahoo! Labs Best Student Paper at SIGIR’11
Motivation • A user may be given 5 random movies and chooses Die Hard • This tells us the user prefers action movies • A user may be given 5 actions movies and chooses Die Hard over Rocky and Terminator • This tells us the user prefers Bruce Willis
Roadmap (CCF) • Motivation • Problem Setting & Input • Techniques • Extensions
Problem Setting • Set of users, U • Set of items, I • Each user interaction has an offer set O and a decision set D • Each user interaction is stored as a tuple (u, O, D) where D is a subset of O
CCF Input 1 means user interaction, - means it was in the offer set
Roadmap (CCF) • Motivation • Problem Setting & Input • Techniques • Extensions
Local Optimality of User Choice • Each item has a potential revenue to the user which is rui • Users also consider the opportunity cost (OC) when deciding potential revenue • OC is what the user gives up for making a given decision • OC is cui = max( i’ | i’ in O \ i) • Profit is πui= rui – cui
Local Optimality of User Choice • A user interaction is an opportunity give and take process • User is given a set of opportunities • User makes a decision to select one of the many opportunities • Each opportunity comes with some revenue (utility or relevance)
Competitive Collaborative Filtering • Local optimality constraint • Each item in the decision set has a revenue higher than those not in the decision set • Problem becomes intractable with only this constraint, no unique solution
CCF – Hinge Model • Optimization goal • Minimize error (ξ, slack variable) & model complexity
CCF – Hinge Model • Find average potential utility • Average utility of non-chosen items • Constraints • Chosen items have a higher utility • eui is an error term
CCF – Hinge Model • Optimization Goal • Assume ξ is 0 Average Relevance of Non-chosen Items
CCF – How to use results • We can predict the relevance of all items based on user and item vectors • Can set threshold if more than one item can be chosen (e.g. θ > .9 implies action)
Roadmap (CCF) • Motivation • Problem Setting & Input • Techniques • Extensions
Extensions • Sessions without a response • User does not take any opportunity • Adding content features • Fixed features for each item rather than a limited number of parameters to improve accuracy of new item prediction