220 likes | 348 Views
Algorithms for Efficient Collaborative Filtering. Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain). Outline. Introduction Background in Collaborative Filtering Proposed algorithms Experiments Conclusions. Introduction. More and more information every day
E N D
Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)
Outline • Introduction • Background in Collaborative Filtering • Proposed algorithms • Experiments • Conclusions EIIR 2008
Introduction • More and more information every day • Personalized retrieval systems are quite interesting • Recommender systems: recommend items that would be more appropriate for the user’s needs or preferences • Useful in e-commerce, but we think they could be also useful in Web IR • Recommender systems store some information about the user preferences User profile • Explicit or implicit EIIR 2008
Introduction • Types of recommender systems: • Content-based filtering: recommend items based on their content • Depends on automatic analysis of the items • Unable to determine the item quality • Serendipitous find • Collaborative filtering: based on other users evaluations • It will recommend items well considered by other users with similar interests • Problems with computational performance and efficiency EIIR 2008
Outline • Introduction • Background in Collaborative Filtering • Proposed algorithms • Experiments • Conclusions EIIR 2008
Background • User profile: evaluations carried by the user • Evaluation: numerical value (e.g. 1 – 5) • Evaluation matrix: contains the evaluation of the users • Types of collaborative filtering algorithms: • Memory-based: use similarity measures to predict related neighbours (users or items) • The entire matrix is used in each prediction • Model-based: build a model that represents the user behaviour predict his evaluations • The parameters of the model are estimated using the evaluation matrix (off-line) EIIR 2008
Background • Memory-based • Simple and give reasonably precise results • Low scalability • More sensitive to common recommender systems problems: sparsity, cold-start and spam. • Model-based • Finds underlying characteristics in the data • Faster in prediction time • Complexity of the models: • Sensitive to changes in the data • High construction times • Model updating when new data are available EIIR 2008
Background: Notation Items (I) Evaluation matrix (V) i1 i2 … in Users (U) User profile (I1) u1 v11 … u2 … v2n . . . . . . . . . . . . . . . vu. : evaluations of user u v.i : evaluations for item i Mean values: vu. and v.i um vm1 vm2 … Users that have evaluated i1 (U1) Prediction of evaluation of user m for item n (pmn) EIIR 2008
Outline • Introduction • Background in Collaborative Filtering • Proposed algorithms • Experiments • Conclusions EIIR 2008
Proposed algorithms • Objectives: • Good behaviour in low density • Computational efficiency • Constant updating • Item mean algorithm • Our base Use the mean of an item as its prediction • Simple mean based algorithm • The item mean is corrected with the mean of the user EIIR 2008
Proposed algorithms • Tendencies based algorithm • Main idea: users tend to evaluate items positively or negatively Include tendencies in the formula • Tendency ≠ mean • Tendency of a user (ubu) and tendency of an item (ibi): • In this algorithm we use the mean of the item and the user as well as their respective tendencies. EIIR 2008
Proposed algorithms • Tendencies based algorithm EIIR 2008
Outline • Introduction • Background in Collaborative Filtering • Proposed algorithms • Experiments • Conclusions EIIR 2008
Experiments • Algorithms evaluated • Memory-based: user-based, item-based and similarity fusion • Model-based: regression based, slope one, latent semantic index and cluster based smoothing • Hybrid: personality diagnosis • Dataset MovieLens • Real rating of films: 1 (very bad) – 5 (excellent) • 100,000 evaluations from 943 users for 1,682 movies (1.78 items evaluated/user). Density 6% • Training set: 10%, 50% and 90% • For each algorithm we evaluated (5 times): • Training and prediction times • Quality of the predictions EIIR 2008
Proposed algorithms • Tendencies based algorithm • Only 5% of the prediction with 10% training set • 2% of the prediction with 90% training set This case represents some unusual elements Tendencies seem a good prediction mechanism EIIR 2008
Experiments: Computational complexity EIIR 2008
Experiments: Training time EIIR 2008
Experiments: Prediction time EIIR 2008
Experiments: Prediction quality EIIR 2008
Outline • Introduction • Background in Collaborative Filtering • Proposed algorithms • Experiments • Conclusions EIIR 2008
Conclusions • We have presented a couple of algorithms for collaborative filtering: • Very simple Good response times • Tendencies based algorithm: • Quality of the predictions equivalent to the best algorithms • Even better in low density training sets • Next steps: use these algorithms in Web IR • Problems: dataset? EIIR 2008
Thank you! Questions? EIIR 2008