Algorithms for Efficient Collaborative Filtering

Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Outline • Introduction • Background in Collaborative Filtering • Proposed algorithms • Experiments • Conclusions EIIR 2008

Introduction • More and more information every day • Personalized retrieval systems are quite interesting • Recommender systems: recommend items that would be more appropriate for the user’s needs or preferences • Useful in e-commerce, but we think they could be also useful in Web IR • Recommender systems store some information about the user preferences  User profile • Explicit or implicit EIIR 2008

Introduction • Types of recommender systems: • Content-based filtering: recommend items based on their content • Depends on automatic analysis of the items • Unable to determine the item quality • Serendipitous find • Collaborative filtering: based on other users evaluations • It will recommend items well considered by other users with similar interests • Problems with computational performance and efficiency EIIR 2008

Background • User profile: evaluations carried by the user • Evaluation: numerical value (e.g. 1 – 5) • Evaluation matrix: contains the evaluation of the users • Types of collaborative filtering algorithms: • Memory-based: use similarity measures to predict related neighbours (users or items) • The entire matrix is used in each prediction • Model-based: build a model that represents the user behaviour  predict his evaluations • The parameters of the model are estimated using the evaluation matrix (off-line) EIIR 2008

Background • Memory-based • Simple and give reasonably precise results • Low scalability • More sensitive to common recommender systems problems: sparsity, cold-start and spam. • Model-based • Finds underlying characteristics in the data • Faster in prediction time • Complexity of the models: • Sensitive to changes in the data • High construction times • Model updating when new data are available EIIR 2008

Background: Notation Items (I) Evaluation matrix (V) i1 i2 … in Users (U) User profile (I1) u1 v11 … u2 … v2n . . . . . . . . . . . . . . . vu. : evaluations of user u v.i : evaluations for item i Mean values: vu. and v.i um vm1 vm2 … Users that have evaluated i1 (U1) Prediction of evaluation of user m for item n (pmn) EIIR 2008

Proposed algorithms • Objectives: • Good behaviour in low density • Computational efficiency • Constant updating • Item mean algorithm • Our base  Use the mean of an item as its prediction • Simple mean based algorithm • The item mean is corrected with the mean of the user EIIR 2008

Proposed algorithms • Tendencies based algorithm • Main idea: users tend to evaluate items positively or negatively  Include tendencies in the formula • Tendency ≠ mean • Tendency of a user (ubu) and tendency of an item (ibi): • In this algorithm we use the mean of the item and the user as well as their respective tendencies. EIIR 2008

Proposed algorithms • Tendencies based algorithm EIIR 2008

Experiments • Algorithms evaluated • Memory-based: user-based, item-based and similarity fusion • Model-based: regression based, slope one, latent semantic index and cluster based smoothing • Hybrid: personality diagnosis • Dataset MovieLens • Real rating of films: 1 (very bad) – 5 (excellent) • 100,000 evaluations from 943 users for 1,682 movies (1.78 items evaluated/user). Density 6% • Training set: 10%, 50% and 90% • For each algorithm we evaluated (5 times): • Training and prediction times • Quality of the predictions EIIR 2008

Proposed algorithms • Tendencies based algorithm • Only 5% of the prediction with 10% training set • 2% of the prediction with 90% training set  This case represents some unusual elements  Tendencies seem a good prediction mechanism EIIR 2008

Experiments: Computational complexity EIIR 2008

Experiments: Training time EIIR 2008

Experiments: Prediction time EIIR 2008

Experiments: Prediction quality EIIR 2008

Conclusions • We have presented a couple of algorithms for collaborative filtering: • Very simple  Good response times • Tendencies based algorithm: • Quality of the predictions equivalent to the best algorithms • Even better in low density training sets • Next steps: use these algorithms in Web IR • Problems: dataset? EIIR 2008

Thank you! Questions? EIIR 2008

Algorithms for Efficient Collaborative Filtering

Algorithms for Efficient Collaborative Filtering

Presentation Transcript

Collaborative Filtering

Collaborative Filtering

Collaborative Filtering

Efficient Merging and Filtering Algorithms for Approximate String Searches

Collaborative Filtering

Item Based Collaborative Filtering Recommendation Algorithms

Collaborative Filtering

Collaborative Filtering

Efficient Merging and Filtering Algorithms for Approximate String Searches

Collaborative Filtering

Efficient Merging and Filtering Algorithms for Approximate String Searches

RecTree: An Efficient Collaborative Filtering Method

Collaborative Filtering

Collaborative Filtering

Collaborative Filtering

Collaborative Filtering

Collaborative Filtering

Item-Based Collaborative Filtering Recommendation Algorithms

Item Based Collaborative Filtering Recommendation Algorithms

Collaborative Filtering: