Applied Algorithm Lab Wooram Heo

Toward the Next Generation of Recommender Systems: A Survey of theState-of-the-Art and Possible Extensions IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005 Applied Algorithm Lab WooramHeo

Outline • Recommemder Systems • Problem statement • Survey of Recommender systems • Content-Based Methods • Collabolative Methods • Hybrid Methods

Recommender Systems • Systems for recommending items (e.g. books, movies, CD’s, web pages, newsgroup messages) to users based on examples of their preferences. • Many on-line stores provide recommendations (e.g. Amazon, CDNow). • Recommenders have been shown to substantially increase sales at on-line stores.

Recommender Systems • Examples

Problem statement • Recommendation problem is to estimate ratings for the items that have not been seen by a user • Estimation is usually based on the ratings given by the user to other items and on some other information

Problem statement • : the set of all users • : the set of all possible items that can be recommended • : , where is a nonnegative integers or real numbers within certain range • For each user , we want to choose such item that maximizes the user’s utility. • Utility needs to be extrapolated to the whole space

Recommender System Categories • Content-based recommendations • The user will be recommended items similar to the ones the user preferred in the past • Collaborative recommendations • The user will be recommended items that people with similar tastes and preferences liked in the past • Hybrid approaches • These methods combine collaborative and content-based methods

Content-Based Methods • Recommend items similar to those users preferred in the past • User profiling is the key • E.g. in a movie recommender application, • Specific actors • Directors • Genres • etc

Content-Based Methods • Content-based approach has its roots in information retrieval • Documents, web sites(URLs), and news messages • Designed mostly to recommend text-based items • Content is usually described with keywords

Content-Based Methods • TF-IDF weight for keywords in document is defined as • Content of document is defined as • Cosine similarity measure

Disadvantages • Not all content is well represented by keywords • Multimedia data • Items represented by same set of features are indistinguishable • Overspecialization problem • New user problem • No history available

Collaborative Methods • Use other users recommendations (ratings) to judge item’s utility • Key is to find users/user groups whose interests match with the current user • More users, more ratings: better results • Can account for items dissimilar to the ones seen in the past too

User Database A 9 B 3 C 9 : : Z 5 A B C 9 : : Z 10 A 5 B 3 C : : Z 7 A B C 8 : : Z A 6 B 4 C 2 : : Z A 10 B 4 C 8 . . Z 1 A 9 B 3 C . . Z 5 A 9 B 3 C 9 : : Z 5 A 10 B 4 C 8 . . Z 1 Correlation Match Extract Recommendations C Active User Collaborative Methods

Collaborative Methods • Memory-based algorithms • Value of the unknown rating for user and item is usually computed as an aggregate of the ratings of some other users for the same item • Where denotes the set of users that are the most similar to user c and who have rated item

Collaborative Methods • Similarity between two users • Pearson correlation coefficient • Cosine similarity

Collaborative Methods • Model-based algorithm • Cluster models and Bayesian networks are used to estimate this probability

Collaborative Methods • Model-based approaches use various machine learning techniques • K-means clustering • Gibbs sampling • Bayesian model • Probabilistic relational model • Linear regression • Maximum entropy model • Markov decision process • Probabilistic latent semantic analysis • Latent Dirichlet allocation • etc

Disadvantages • Finding similar users/user groups isn’t very easy • New user problem : No preferences available • New item problem: No ratings available • Sparsity problem

END

Applied Algorithm Lab Wooram Heo