190 likes | 321 Views
Toward the Next Generation of Recommender Systems : A Survey of the State-of-the-Art and Possible Extensions. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005 . Applied Algorithm Lab Wooram Heo. Outline. Recommemder Systems Problem statement Survey of Recommender systems
E N D
Toward the Next Generation of Recommender Systems: A Survey of theState-of-the-Art and Possible Extensions IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005 Applied Algorithm Lab WooramHeo
Outline • Recommemder Systems • Problem statement • Survey of Recommender systems • Content-Based Methods • Collabolative Methods • Hybrid Methods
Recommender Systems • Systems for recommending items (e.g. books, movies, CD’s, web pages, newsgroup messages) to users based on examples of their preferences. • Many on-line stores provide recommendations (e.g. Amazon, CDNow). • Recommenders have been shown to substantially increase sales at on-line stores.
Recommender Systems • Examples
Problem statement • Recommendation problem is to estimate ratings for the items that have not been seen by a user • Estimation is usually based on the ratings given by the user to other items and on some other information
Problem statement • : the set of all users • : the set of all possible items that can be recommended • : , where is a nonnegative integers or real numbers within certain range • For each user , we want to choose such item that maximizes the user’s utility. • Utility needs to be extrapolated to the whole space
Recommender System Categories • Content-based recommendations • The user will be recommended items similar to the ones the user preferred in the past • Collaborative recommendations • The user will be recommended items that people with similar tastes and preferences liked in the past • Hybrid approaches • These methods combine collaborative and content-based methods
Content-Based Methods • Recommend items similar to those users preferred in the past • User profiling is the key • E.g. in a movie recommender application, • Specific actors • Directors • Genres • etc
Content-Based Methods • Content-based approach has its roots in information retrieval • Documents, web sites(URLs), and news messages • Designed mostly to recommend text-based items • Content is usually described with keywords
Content-Based Methods • TF-IDF weight for keywords in document is defined as • Content of document is defined as • Cosine similarity measure
Disadvantages • Not all content is well represented by keywords • Multimedia data • Items represented by same set of features are indistinguishable • Overspecialization problem • New user problem • No history available
Collaborative Methods • Use other users recommendations (ratings) to judge item’s utility • Key is to find users/user groups whose interests match with the current user • More users, more ratings: better results • Can account for items dissimilar to the ones seen in the past too
User Database A 9 B 3 C 9 : : Z 5 A B C 9 : : Z 10 A 5 B 3 C : : Z 7 A B C 8 : : Z A 6 B 4 C 2 : : Z A 10 B 4 C 8 . . Z 1 A 9 B 3 C . . Z 5 A 9 B 3 C 9 : : Z 5 A 10 B 4 C 8 . . Z 1 Correlation Match Extract Recommendations C Active User Collaborative Methods
Collaborative Methods • Memory-based algorithms • Value of the unknown rating for user and item is usually computed as an aggregate of the ratings of some other users for the same item • Where denotes the set of users that are the most similar to user c and who have rated item
Collaborative Methods • Similarity between two users • Pearson correlation coefficient • Cosine similarity
Collaborative Methods • Model-based algorithm • Cluster models and Bayesian networks are used to estimate this probability
Collaborative Methods • Model-based approaches use various machine learning techniques • K-means clustering • Gibbs sampling • Bayesian model • Probabilistic relational model • Linear regression • Maximum entropy model • Markov decision process • Probabilistic latent semantic analysis • Latent Dirichlet allocation • etc
Disadvantages • Finding similar users/user groups isn’t very easy • New user problem : No preferences available • New item problem: No ratings available • Sparsity problem