500 likes | 652 Views
CS590I: Information Retrieval . CS-590I Information Retrieval Collaborative Filtering Luo Si Department of Computer Science Purdue University. Abstract. Outline Introduction to collaborative filtering Main framework Memory-based collaborative filtering approach
E N D
CS590I: Information Retrieval • CS-590I • Information Retrieval • Collaborative Filtering • Luo Si • Department of Computer Science • Purdue University
Abstract • Outline • Introduction to collaborative filtering • Main framework • Memory-based collaborative filtering approach • Model-based collaborative filtering approach • Aspect model & Two-way clustering model • Flexible mixture model • Decouple model • Unified filtering by combining content and collaborative filtering • Active learning for collaborative filtering
What is Collaborative Filtering? • Collaborative Filtering (CF): Making recommendation decisions for a specific user based on the judgments of users with similar tastes Content-Based Filtering: Recommend by analyzing the content information Collaborative Filtering: Make recommendation by judgments of similar users
What is Collaborative Filtering? Collaborative Filtering (CF): Making recommendation decisions for a specific user based on the judgments of users with similar tastes
What is Collaborative Filtering? Collaborative Filtering (CF): Making recommendation decisions for a specific user based on the judgments of users with similar tastes
Why Collaborative Filtering? • Advantages of Collaborative Filtering • Collaborative Filtering does not need content information as required by CBF • The contents of items belong to the third-party (not accessible or available) • The contents of items are difficult to index or analyze (e.g., multimedia information) • Problems of Collaborative Filtering • Privacy issues, how to share one’s interest without disclosing too much detailed information?
Why Collaborative Filtering? • Applications Collaborative Filtering • E-Commerce • Email ranking: borrow email ranking from your office mates (be careful…) • Web search? (e.g., local search)
Objects: Om O1 O2 O3 ……Oj………… OM TrainingUsers: Un U1 U2 3 2 4 4 1 1 Ui 5 2 2 UN Test User Ut 2 3 Rut (Oj)= Formal Framework for Collaborative Filtering • What we have: • Assume there are some ratings by training users • Test user provides some amount of additional training data • What we do: • Predict test user’s rating based training information
Memory-Based Approaches • Memory-Based Approaches • Given a specific user u, find a set of similar users • Predict u’s rating based on ratings of similar users • Issues • How to determine the similarity between users? • How to combine the ratings from similar users to make the predictions (how to weight different users)?
Memory-Based Approaches How to determine the similarity between users? • Measure the similarity in rating patterns between different users Pearson Correlation Coefficient Similarity Vector Space Similarity Average Ratings Prediction:
Memory-Based Approaches How to combine the ratings from similar users for predicting? • Weight similar users by their similarity with a specific user; use these weights to combine their ratings. Prediction:
Memory-Based Approaches Remove User-specific Rating Bias
Memory-Based Approaches Normalize Rating
Memory-Based Approaches Calculate Similarity: Wtrn1_test=0.92; Wtrn2_test=-0.44;
Memory-Based Approaches Make Prediction:
Memory-Based Approaches Make Prediction:
Memory-Based Approaches • Problems with memory-based approaches • Associated a large amount of computation online costs (have to go over all users, any fast indexing approach?) • Heuristic method to calculate user similarity and make user rating prediction • Possible Solution • Cluster users/items in offline manner, save for online computation cost • Proposal more solid probabilistic modeling method
P(o|Z) P(u|Z) P(r|Z) P(Z) Z L Ol Rl Ul Collaborative Filtering • Model-Based Approaches: • Aspect Model (Hofmann et al., 1999) • Model individual ratings as convex combination of preference factors • Two-Sided Clustering Model (Hofmann et al., 1999) • Assume each user and item belong to one user and item group. Ix(l)v,Jy(l)u:Indicator Variables Cvu: Associaion Parameter
P(o|Z) P(u|Z) P(r|Z) P(Z) Z L Ol Rl Ul Collaborative Filtering • Model-Based Approaches: • Aspect Model (Hofmann et al., 1999) • Model individual ratings as convex combination of preference factors • Brief description for the expectation maximization training process…
Collaborative Filtering • Thoughts: • Previous algorithms all cluster users and objects either implicitly (memory-based) or explicitly (model-based) • Aspect model allows users and objects to belong to different classes, but cluster them together • Two-sided clustering model clusters users and objects separately, but only allows them to belong to one single class
Cluster users and objects separately AND allow them to belong to different classes Flexible Mixture Model (FMM) Previous Work: Thoughts
Zo Zu P(Zo) P(Zu) P(o|Zo) P(u|Zu) Ol Rl Ul L P(r|Zo,Zu) Collaborative Filtering • Flexible Mixture Model (FMM): Cluster users and objects separately AND allow them to belong to different classes • Training Procedure: Annealed Expectation Maximization (AEM) algorithm E-Step: Calculate Posterior Probabilities
Collaborative Filtering M-Step: Update Parameters • Prediction Procedure: Fold-In process to calculate join probabilities Fold-in process by EM algorithm • Calculate expectation for prediction “Flexible Mixture Model for Collaborative Filtering”, ICML’03
Collaborative Filtering • Thoughts: • Previous algorithms address the problem that users with similar tastes may have different rating patterns implicitly (Normalize user rating)
Explicitly decouple users preference values out of the rating values Decoupled Model (DM) Nice Rating: 5 Mean Rating: 2 Nice Rating: 3 Mean Rating: 1 Previous Work: Thoughts • Thoughts:
P(Zo) P(Zu) P(u|Zu) P(o|Zo) Zo Zu Ul Ol ZPre ZR L Rl P(r|ZPre,ZR) Decoupled Model (DM) • Decoupled Model (DM): Separate preference value (1 disfavor, k favor) from rating Joint Probability: “Preference-Based Graphical Model for Collaborative Filtering”, UAI’03 “A study of Mixture Model for Collaborative Filtering”, Journal of IR
Experimental Data Datasets: MovieRating and EachMovie • Evaluation: • MAE: average absolute deviation of the predicted ratings to the actual ratings on items.
Collaborative Filtering • Vary Number of Training User • Test behaviors of algorithms with different amount of training data • For MovieRating 100 and 200 training users • For EachMovie 200 and 400 training users • Vary Amount of Given Information from the Test User • Test behaviors of algorithms with different amount of given information • from test user • For both testbeds Vary among given 5, 10, or 20 items
Given: Given: Given: Given: 5 5 5 5 20 20 20 20 10 10 10 10 Results of Flexible Mixture Model MAE MAE Movie Rating, 200 Training Users Movie Rating, 100 Training Users MAE MAE Each Movie, 400 Training Users Each Movie, 200 Training Users
Experimental ResultsImproved by Combing FMM and DM Results on Movie Rating Results on Each Movie
Combine Collaborative Filtering and Content-Based Filtering Content-Based Filtering (CBF): Recommend by analyzing the content information Content information is very useful when few users have rated an object. Science Fiction? A group of aliens visit earth…………………….. kind of friendship in which E.T learns………… Yes Young Harry is in love and wants to marry an actress, much to the displeasure of his family…. No Unified Filtering (UF): Combining both the content-based information and the collaborative rating information for more accurate recommendation
Content-Based Filtering and Unified Filtering • Content-Based Filtering (CF): Generative Methods (e.g. Naïve Bayes) Discriminative Methods (e.g. SVM, Logistic Regression) • Usually more accurate • Can be used to combine features (e.g., actors for movies) • Unified Filtering by combining CF and CBF: Linearly combine the scores from CF and CBF Personalized linear combination of the scores Bayesian combination with collaborative ensemble learning
P(o) P(u) P(Zu|U) Ol Ul dolj Zo Zu R Unified Filtering by flexible mixture model and exponential model • Unified Filtering with mixture model and exponential model (UFME): Mixture model for rating information: OlJ Exponential model for content information P(r|Zx,Zy)
P(o) P(u) P(Zu|U) Ol Ul dolj Zo Zu R Unified Filtering by flexible mixture model and exponential model • Unified Filtering with mixture model and exponential model (UFME): Mixture model for rating information: Exponential model for content information: OlJ P(r|Zx,Zy) Specific word
Unified Filtering by flexible mixture model and exponential model • Training Procedure: E-Step: Calculate posterior probabilities Expectation Step of EM M-Step: Update parameters Iterative Scaling Training Second, refine the object cluster distribution with content information by maximizing “Unified Filtering by Combining Collaborative Filtering and Content-Based Filtering via Mixture Model and Exponential Model ”, CIKM’04
Experiment Results Table. MAE results for four filtering algorithms on EachMovie testbed. Four algorithms are pure content-based filtering (CBF), pure collaborative filtering (CF), unified filtering by combining mixture model and exponential model (UFME)
Experiment Results Table. Five most indicative words (with highest values) for 5 movie clusters, sorted by Each column corresponds to a different movie cluster. All listed words are stemmed.
Summarization • What we talked about so far? • Propose the flexible mixture model • Demonstrate the power of clustering users and objects separately AND allowing them to belong to different classes • Propose the decoupled model • Demonstrate the power of extracting preference values from the surface rating values • Propose the unified probabilistic model for unified filtering • Demonstrate the power of taking advantage of content information with limited rating information
P(o|Z) P(u|Z) P(r|Z) P(Z) Z L Ol Rl Ul Collaborative Filtering • Model-Based Approaches: • Aspect Model (Hofmann et al., 1999) • Model individual ratings as convex combination of preference factors • Two-Sided Clustering Model (Hofmann et al., 1999) • Assume each user and item belong to one user and item group. Ix(l)v,Jy(l)u:Indicator Variables Cvu: Associaion Parameter
Active Learning • Motivation of Active Learning: • Learn correct model using a small number of labeled examples • Active select training data: select unlabeled examples that lead to the largest reduction in the expected loss function • Key problems • Loss function, what is the best loss function? Prediction uncertainty? Model uncertainty? Should we consider the distribution of test data? • Should we only consider the current model or we need to incorporate the model uncertainty?
Problem with Using Current Models in Active Learning • The estimated model is a horizontal line while the true model is a vertical line • Active learning based on the current model will tend to select examples close to horizontal line, which lead to an inefficient learning Labeled Examples Borrow Slides from Rong Jin
Active Learning for CF • The key of active learning for CF is to efficiently identify the user type for the active users {p(y|z)}z • Previous approaches • ‘Model entropy method’: search for an object whose rating information can most efficiently reduce the uncertainty in {p(y|z)}z • ‘Predication entropy method’: search for an object whose rating information can most efficiently reduce the uncertainty in the predicted ratings • Problems with previous approaches: • Both methods only utilize a single model (i.e., the current model) for estimating the expected cost • Since each user can be of multiple user classes, purifying the distribution {p(u|z)}z may not necessary be the right goal
A Bayesian Approach • Key: find the object whose rating can help the model efficiently identify the distribution of user classes for the active user, or = {z = p(y|z)}z • When true model trueis given, find the example by • Replacing true model truewith the distribution for parameters p(|D)
Approximate Posterior Distribution • Directly estimate the posterior distribution p(|D) is difficult • Approximate p(|D) using Laplacian approximation • Expand the log-likelihood function around its maximum point *
Efficiently Compute Expectation • Integration can be computed analytically • Efficiently updated model |x,r using posterior distribution p(|D)
Experiment: Result for ‘EachMovie’ • Bayesian > random prediction >> model
Experiment: Results for ‘MovieRating’ • Bayesian > random > prediction > model
Conclusion • Existing active learning algorithms do not work for aspect model for CF • Model entropy method sometimes works substantially worse than the simple random method • The Bayesian approach toward active learning works substantially better than random selection • Future work • Extend the Bayesian analysis to active learning for robust estimation of prediction errors.
Conclusion and Future Work • Some of my related work: “Flexible Mixture Model for Collaborative Filtering”, ICML’03 “Preference-Based Graphical Model for Collaborative Filtering”, UAI’03 “A Study of Methods for Normalizing User Ratings in Collaborative Filtering , SIGIR ’04 • “A Bayesian Approach toward Active Learning for Collaborative Filtering” UAI’04 “Unified Filtering by Combining Collaborative Filtering and Content-Based Filtering via Mixture Model and Exponential Model ”, CIKM ’04 “A study of Mixture Models for Collaborative Filtering”, JIR’06
Abstract • Outline • Introduction to collaborative filtering • Main framework • Memory-based collaborative filtering approach • Model-based collaborative filtering approach • Aspect model & Two-way clustering model • Flexible mixture model • Decouple model • Unified filtering by combining content and collaborative filtering • Active learning for collaborative filtering