140 likes | 250 Views
Customizable Bayesian Collaborative Filtering. Denver Dash Big Data Reading Group 11/19/2007. Why Work on Recommender Systems?. Generally: Widespread interest from different industries. Marketers save time/money by advertising to the right people.
E N D
Customizable Bayesian Collaborative Filtering Denver Dash Big Data Reading Group 11/19/2007
Why Work on Recommender Systems? • Generally: • Widespread interest from different industries. • Marketers save time/money by advertising to the right people. • People get less spam and useful suggestions. • Big Data: • Computationally hard problem. • Large scales (many users) improves performance. • Complexity of models means that more data increasingly improves performance. • “Everyday Sensing and Reasoning” (aka the Megabet): • Interesting future applications on collaborative sharing of information on mobile devices.
Crude Characterization of Recommender Systems • Content Based: Build a user model based on user preferences for Genre, Director, Actors, etc to predict user rating. • Collaborative Filtering: Brute-force statistical analysis. Clustering + dimensionality reduction.
Motivating Application:Predicting User Movie Ratings Netflix Prize Data: • >17.7K Movies • >480K Users • Raw Training Data: 1.4 GB uncompressed sparse text. Movies Users
Typical Approaches to Recommender Systems • Use linear models (e.g., take average over similar users, take average over similar movies) • Use dimensionality reduction techniques: SVD, PCA, regularized regression, etc. • Combine multiple approaches with more linear models. • A lot of engineering for special cases and efficiency.
Current Netflix Leaders (at least a few who have published their approaches) • #1 Bell, Koren and Volinsky, AT&T Labs: Combine linear CF methods with Pseudo-Content-based methods. I.e., use approximate SVD to “learn” hidden content. • #7 Salakhutdinov, Mnih and Hinton, UToronto: Combine latent variable graphical models with an approximate SVD. • #8 Paterek, Warsaw University: Combined approximate regularized SVD with ridge regression, K-means and other models.
Desirable Properties • Create models customized to users and/or movies. • Avoid overfitting:106-109 parameters and sparse data (use clustering and regularization). • Use latent-variable models to cluster users and/or movies. • Weigh users/parameters with more data support higher. • Take into account user bias relative to other users • When data is sparse or totally absent, estimated ratings should reduce to marginal estimates. • Take into account dependencies between items. • Take into account temporal trends.
Customized Bayesian Collaborative Filtering • Incrementally expandable with new content-based features. • Uses a principled framework with explicit assumptions. • Exhibits most desirable properties. • Avoids overfitting: regularization built in. • Uses a latent-variable model. • Weighs parameters with more data higher. • Take into account user/movie bias relative to other users/movies • When data is sparse or totally absent, estimated ratings reduce to marginal estimates. • Can take into account dependencies between items. • Can take into account temporal trends.
rj N “How often in the past I agreed with my neighbors’ ratings on the same movie.” “How often in the past I agreed with my own ratings on similar movies.” rM rU Customized Bayesian Collaborative Filtering rj– rating of a particular user j. rM-weighted average of the user j’s rating of similar movies. rU– weighted average of similar users’ ratings of the target movie. learned from users’ data using weak Dirichlet priors based on marginal data over all users. Given a user j and a movie m, calculate rU(j,m) and rM(j,m): Neighbors of user j determined previously by clustering. Neighbors of movie m determined previously by clustering.
Nice Properties: Corner Cases • Takes into account user bias (by calculating P(ri) from user data). • The expected rating of a new user is based on the expectation of the entire set of users (due to the Dirichlet priors). • When a user has only a little data, his/her data is taken into account, but is smoothed by the remaining users. • When a user has lots of data, it will overwhelm the Dirichlet priors and we can learn an accurate customized model for him/her.
– weighted average rating of the target movie over r’s anti-cluster U. – weighted average rating of the target movie’s anti-cluster M. Nice Property: Incremental Expandability rj N Year Genre Director Actors Again Dirichlet priors allow us to smooth these parameters to avoid over-fitting.
Nice Property: Customization Without Over-fitting • Naïve models can be sensitive to many redundant or unimportant features. • Different features may be more informative for different users. • Individualized Feature Selection may not work well if the individual user has not rated many movies. • Solution: Structural Bayesian Model Averaging: 2N structures.
Efficient BMA Under certain assumptions, averaging over all 2N feature sets can be performed in O(N) time for a naïve BN structure. (Dash and Cooper, 2004) Re-parametrize network with: ML Parameters Marginal Likelihood of the Feature set Structure Prior Once parameters are calculated and cached, we can do BMA inference in O(N) time.
original database D rj Year Genre Director Actors rj rj rj Year Year Year Genre Genre Genre Actors Actors Actors Director Director Director All user models Baseline model Overview of the Method Web/Wiki Crawler Calc rMand rU clustered database D’ D’’ EM/MinHash Clustering of Features D’’’ Fully- specified database augmented database MAP Naïve Bayes Learning BMA Learning for all users