170 likes | 263 Views
Personalized Active Learning for Collaborative Filtering. SIGIR, 2008 Presented by Abhay S. Harpale, Yiming Yang Carnegie Mellon University 2009-01-22 Summarized by Jaeseok Myung. Intelligent Database Systems Lab School of Computer Science & Engineering
E N D
Personalized Active Learning for Collaborative Filtering SIGIR, 2008 Presented by Abhay S. Harpale, Yiming Yang Carnegie Mellon University 2009-01-22 Summarized by Jaeseok Myung Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University, Seoul, Korea
Background • Collaborative Filtering • Make a recommendation for a specific user based on the judgments of users with similar interests Rating Database Active User Center for E-Business Technology
Background • Collaborative Filtering • Make a recommendation for a specific user based on the judgments of users with similar interests • Identify the training users that share similar interests as the active user Rating Database Active User Center for E-Business Technology
Background • Collaborative Filtering • Make a recommendation for a specific user based on the judgments of users with similar interests • Identify the training users that share similar interests as the active user • Predict the ratings of the active user as the average of ratings by the training user with similar interests Rating Database Active User Center for E-Business Technology
Background • Model-based Collaborative Filtering • Different from memory-based(instance-based) CF • Make the intuitive assumption that users/items can be grouped based on their interests • Aspect Model • Is a probabilistic latent semantic model in which users are considered to be a mixture of multiple interests or aspects User A training z1 z2 zk movies Rating R={1, 2, … , 5} predicting Center for E-Business Technology
Background • Active Learning • There are situations in which unlabeled data is abundant but labeling data is expensive • Aims to train classifiers/models using least amount of training/labeled data • Because obtaining labeled data can be a very costly or infeasible process • Has been extensively studied for classification, where the goal is to identify unlabeled instances to be labeled according to membership in a class Center for E-Business Technology
Problem • How to identify user’s interests based on a few rated item? • To identify the interests of the user, the system needs to ask the user to rate a few item • Given a user is only willing to rate a few items, which one should be asked to solicit rating? => Active Learning for CF Center for E-Business Technology
Active Learning for CF • Selective Sampling • Ask a user to rate the items that are most distinguishable for users’ interests • A General Strategy • Define a loss function that represents the uncertainty in determining users’ interests • Choose the item whose rating will result in the largest reduction in the loss function • Baseline Approaches • Random Selection • Entropy-based Selection Center for E-Business Technology
A Bayesian Approach for Active Learning • A Baysian approach outperforms the other approaches • Rong Jin, Luo Si, A Bayesian Approach toward Active Learning for Collaborative Filtering, UAI 2004 • This approach identifies item m, such that the updated model will be accelerated towards the true user model • Maximized when the estimated distribution is equal to the true distribution being modeled (negated KL-Divergence equation) The model posterior after retraining the user-model based on a newly obtained rating r for movie m from the user i.e. P(z|u, m, r) Since the true user model is unknown beforehand, it is estimated as the expectation over the posterior distribution of the user model The rating r is unknown for unrated movies and the expected value is used instead Center for E-Business Technology
Personalized Bayesian Selection • A Common Assumption on Existing Approaches • Users can provide rating to any item that is requested by the system => unrealistic assumption • To rate a movie, a person has to first procure the movie, and watch it • Personalized Bayesian Selection 5 Failure ? I haven’t seen the movie The probability of getting a rating, on the item m from the user u Center for E-Business Technology
Experimental Setup Constrained Setup Unconstrained Setup Center for E-Business Technology
Evaluation Metrics • Mean Absolute Error (MAE) • # of Failures • The system solicits ratings for movies from the user and the user may not provide ratings for some of them • The system cannot be re-trained and wastes an active-learning cycle and proceeds to the next iteration The evaluation set for the user u The set of test users Center for E-Business Technology
Constrained Setup • The active-selection set is constrained to rated items (unrealistic!) Center for E-Business Technology
Unconstrained Setup • Personalized Bayesian Selection outperforms other approaches • BS performs even worse as compared to plain RS Center for E-Business Technology
# of Failures • The most informative items may not be rated by the user • PBS substantially reduces the number of failures Center for E-Business Technology
Summary • Collaborative Filtering • To obtain the preference of new users, the system asks users to rate some items • Users would not like a system which solicits ratings for movies the user may not even watch and such a dialog can be frustrating • Active Learning for Collaborative Filtering • The system would like to understand the user preference with the least amount of training examples • Existing approaches are not realistic • Personalized Active Learning for Collaborative Filtering • Considering the probability of getting a rating from the user • Good performance Center for E-Business Technology
Paper Evaluation • Pros • Good & clear idea • Tackles to a common assumption • A new evaluation metric • # of failures • Cons • No examples Center for E-Business Technology