530 likes | 612 Views
RECOMMENDATION SYSTEMS. ÖZNUR KIRMEMİŞ. OUTLINE. INTRODUCTION FORMALIZATION OF THE PROBLEM APPROACHES COLLABORATIVE CONTENT BASED HYBRID CONCLUSION. PAPERS.
E N D
RECOMMENDATION SYSTEMS ÖZNUR KIRMEMİŞ
OUTLINE • INTRODUCTION • FORMALIZATION OF THE PROBLEM • APPROACHES • COLLABORATIVE • CONTENT BASED • HYBRID • CONCLUSION
PAPERS 1. Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions,Gediminas Adomavicius, Alexander Tuzhilin IEEE Transactions on Knowledge and Data Engineering(June 2005) 2. Content-Boosted Collaborative Filtering for Improved Recommendations,Prem Melville, Raymond J. Mooney and Ramadass Nagarajan,Proceedings of the Eighteenth National Conference on Artificial Intelligence(AAAI-2002) 3.Recommendation as Classification: Using Social and Content-Based Information in Recommendation, Chumki Basu, Haym Hirsh,William Cohen(AAAI-1998)
PART 1: INTRODUCTION
Search Recommendations Recommendations • We are in the Information society. The quantity ofnew information available every day goes over our limitedprocessing capabilities. • We face far more choices than we can try in the world, like, which book shall I read, which movie is worth watching, where I shall have dinner tonight, etc. • For this reason, we needsomething able to suggest us only the worthwhile information. • Make search space smaller! Items Products, web sites, blogs, news items, …
Recommendations • Acting upon recommendations from other people is anormal part of life. • By using recommendationswe can take a shortcut to the things we likewithout having to try many things we dislike or withouthaving to acquire all the knowledge to make an informeddecision. • Recommender systems(RS) automate this facility. • Recommendation systems are thus a solution for information overload.
DEFINITION OF RS programs which attempt to predict items(movies, music, books, news, web pages) that a user may be interested in, given some information about the user's profile
Recommendation Systems Based on a synthesis of ideas from; • Artificial Intelligence • Natural Language Processing • Human-Computer Interaction • Sociology • Information Retrieval • and the technology of the WWW
GENERIC RS • For a typical recommender system, there are three steps: • The user provides some form of input to the system. These inputs can be both explicit and implicit . Ratings submitted by users are among explicit inputs whereas the URLs visited by a user and time spent reading a web site are among possible implicit inputs. • These inputs are brought together to form a representation of the user's likes and dislikes. This representation could be as simple as a matrix of items-ratings, or as complex as a data structure combining both content and rating information. • The system computes recommendations using these user profiles. • Even though the steps are essentially the same for most recommender systems, there have been different approaches to both step 2 and 3.
Current Examples MovieLens • Movie recommendation • makes use of collaborative filtering technology • gathers user preferences by asking the user to rate movies. • searches for similar profiles (i.e. users that share the same or similar taste) and uses them to generate new suggestions.
Current Examples • Amazon • Book recommendations • recommends books frequently purchased by customers who purchased the selected book • customers receive text recommendations based on the opinions of other customers • LIBRA • Book recommendations • Combines a content-based approach with machine learning
Current Examples • Cinemax.com • Moviecritic: movies again • And much more……
PART 2: FORMALIZATION OF THE PROBLEM
Formal Model<C,S,u> • Let C be the set of all users or customers and let S be the set of all possible items that can be recommended, such as books, movies, or restaurants. • S = set of Items • C = set of Customers • Let u be a utility function that measures the usefulness of item s to user c • Utility function u: • C ×S→R,
Utility Function • Utility function u: C ×S→R, • R • e.g., 0-5 stars, real number in [0,1] • u(c1,s1) = r1; u(c1,s2) = r2;..... • Recommendation: for each user c є C, choose such item si є Sthat maximizes the user’s utility
USER SPACE && ITEM SPACE • USER SPACE(C): • can be defined with aprofile that includes various user characteristics, such as age,gender, income, marital status, etc. • ITEM SPACE(S): • Similarly, each element of the item space S can be defined with a set of characteristics. • Ex; (in amovie recommendation application): • S:a collectionof movies, • each movie can be represented not only by its ID,but also by its title, genre, director, year of release, leadingactors, etc.
UTILITY FUNCTION • The central problem of recommender systems lies in thatutility u is usually not defined on the whole CXS space,but only on some subset of it. • This means u needs to beextrapolated to the whole space CXS. • The recommendation engine should be able to estimate the ratings of the nonrated item/user combinations and issue appropriate recommendations based on these predictions.
Example: Utility Matrix King Kong Garfield Matrix Usual Suspects Ayşe Ali Veli Hasan • Gathering “known” ratings for matrix • Extrapolate unknown ratings from known ratings
EXTRAPOLATION • Extrapolations from known ratings are done by • Specifying heuristics that defines the utility function and validating its performance. • Estimating the utility function that optimizes certain performace criterion, such as the mean square error. • Once the unknown ratings are estimated, recommendations to a user are made byselecting the highest rating among all the estimatedratings for that user. • Alternatively, wecan recommend the N best items to a user.
PART 3: APPROACHES Content Based Collaborative Hybrid
APPROACHES • Recommender systems are usually classified into the following categories, based on how recommendations are made: • Content-based recommendations: • The user will be recommended items similar to the ones the user preferred in the past, similarity between user profile and item profile, or similarity between item profiles. • Collaborative recommendations: • aim to identify users that have relevant interests and preferences by calculating similarities and dissimilarities between user profiles • The user will be recommended items that are preferred by other people with similar tastes and preferences. • Hybrid approaches: • These methods combine collaborative and content-based methods.
Content-based Methods • Main idea: • recommend items to customer C similar to previous items rated highly by C • No similar user information!! • Formalization: • the utilityu(c,s) of item s for user c is estimated based on • the utilitiesu(c,si) assigned by user c to items siє S that are “similar”to item s.
Content-based Methods • has itsroots in information retrieval and informationfiltering research. • The improvement over the traditionalinformation retrieval approaches comes from the useof user profiles that contain information about users’ tastes,preferences, and needs. • The profiling information • can be obtained from users explicitly, e.g., through questionnaires,or • implicitly—learned from their transactional behaviorover time. • Can use a machine learning algorithm to induce a profile of the users preferences
Plan of action(Item Profile+User Profile+Prediction Mechanism) Item profiles likes recommend objectswith similar content, same color, shape,.. build recommend Red Circles Triangles match User profile
Item Profiles • For each item, create an item profile • Let Content(s)be an item profile, • a setofattributes characterizing item s. • movies: author, title, actor, director • text: set of “important” words in document • attributes are used to determine theappropriateness of the item forrecommendation purposes.
Item Profiles • How attributes determined? • straightforward • By deciding which slots are important • Slots: Author,Title,Editorial Reviews,..etc • By processing texts • The “importance” (or “informativeness”) of word kj in document dj is determined with some weighting measure wij that can be defined in several different ways. • One of the best-known measures for specifying keyword weights in Information Retrieval is the term frequency/inverse document frequency (TF-IDF) measure.
User profiles • LetContentBasedProfile(c) be the profile of user c containingpreferences of this user. These profiles areobtained by analyzing the content of the items previouslyseen and constructedusing keyword analysis techniques from informationretrieval. • For example,ContentBasedProfile(c) can bedefined as a vector of weights(wc1, . . . , wck), where eachweight wcidenotes the importance of keyword ki to user cand can be computed from individually rated contentvectors using a variety of techniques.
Prediction • In content-based systems, the utilityfunction u(c,s)isusually defined as: • Especially, recommending Web pages, both ContentBasedProfile(c) ofuser c and Content(s) of document s can be represented asTF-IDF vectors and of keyword weights. • Moreover,utility function u(c,s) is usually represented in theinformation retrieval literature by some scoring heuristicdefined in terms of vectors mentioned above, such as the cosinesimilarity measure. K is the total number of keywords in the system.
LIBRALearning Intelligent Book Recommending Agent • Content-based recommender for books using information about titles extracted from Amazon. • Uses information extraction from the web to organize text into fields: • Author • Title • Editorial Reviews • Customer Comments • Subject terms • Related authors • Related titles
Amazon Pages LIBRA Database Information Extraction Rated Examples Machine Learning Learner Recommendations User Profile 1.~~~~~~ 2.~~~~~~~ 3.~~~~~ : : : Predictor EXAMPLE: LIBRA System
Sample Extracted Information Title: <The Age of Spiritual Machines: When Computers Exceed Human Intelligence> Author: <Ray Kurzweil> Price: <11.96> Publication Date: <January 2000> ISBN: <0140282025> Related Titles: <Title: <Robot: Mere Machine or Transcendent Mind> Author: <Hans Moravec> > … Reviews: <Author: <Amazon.com Reviews> Text: <How much do we humans…> > … Comments: <Stars: <4> Author: <Stephen A. Haines> Text:<Kurzweil has …> > … Related Authors: <Hans P. Moravec> <K. Eric Drexler>… Subjects: <Science/Mathematics> <Computers> <Artificial Intelligence> …
Libra Content Information • Libra uses this extracted information to form “bags of words” for the following slots: • Author • Title • Description (reviews and comments) • Subjects • Related Titles • Related Authors
Libra Overview • User rates selected titles on a 1 to 10 scale. • Libra uses a naïve Bayesian text-categorization algorithm to learn a profile from these rated examples. • Rating 6–10: Positive • Rating 1–5: Negative
LIMITATIONS(Content Based) • Finding the appropriate features • Overspecialization • Never recommends items outside user’s content profile • introduce some randomness • ex: genetic algorithms • the diversity of recommendationsis often a desirable feature in recommender systems. • Too similar items should not be recommended, • ex:a different news article describing the same event.
LIMITATIONS(Content Based) • Recommendations for new users • How to build a profile? • The user has to rate a sufficient number of items before a content-based recommender system can really understand the user’s preferences. Therefore, a new user, having very few ratings, would not be able to get accurate recommendations.
Collaborative Filtering • Unlike content-based recommendation methods, collaborativerecommender systems (or collaborative filtering systems)try to predict the utility of items basedon the items previously rated by other similar users. • The utility u(c,s)of item s for user c is estimated based onthe utilities u(c,s) assigned to item s by those users cjєCwho are “similar” to user c.
Basic Algorithm • Maintain a database of many users’ ratings of a variety of items. • For a given user, find other similar users whose ratings strongly correlate with the current user. • Recommend items rated highly by these similar users, but not rated by the current user. • Almost all existing commercial recommenders use this approach (e.g. Amazon).
Similar Users • Let rx be the vector of user x’s ratings • Cosine similarity measure • sim(x,y) = cos(rx , ry) • Pearson correlation coefficient • ....
User Database A 9 B 3 C : : Z 5 A B C 9 : : Z 10 A 5 B 3 C : : Z 7 A B C 8 : : Z A 6 B 4 C : : Z A 10 B 4 C 8 . . Z 1 A 9 B 3 C . . Z 5 A 9 B 3 C : : Z 5 A 10 B 4 C 8 . . Z 1 Correlation Match Extract Recommendations C Active User Collaborative Filtering
LIMITATIONS(Collaborative) • New User Problem: • same problem as with content-based systems. • Inorder to make accurate recommendations, the system mustfirst learn the user’s preferences from the ratings that theuser gives. • New Item Problem: • New items are added regularly to recommender systems. • Collaborative systems rely solely on users’ preferences to make recommendations. • Therefore, until the new item is rated by a substantial number of users, the recommender system would not be able to recommend it. • Not a problem in content based!! • Works for any kind of item, No feature selection needed
Hybrid Methods • Content-based and collaborative methods have complementary strengths and weaknesses. • Combine methods to obtain the best of both.
HOW TO COMBINE? • Implement two separate recommenders and combine predictions, by giving weights • Add content-based methods to collaborative filtering • Use content-based predictor to complete collaborative data. • “Content-Boosted Collaborative Filtering for Improved Recommendations”,Prem Melville and Raymond J. Mooney and Ramadass Nagarajan, 2002,AAAI
Movie Domain • hybrid approach in the domain of movie recommendation • the user-movieratings from the EachMoviedataset • The dataset contains ratingdata provided by each user for various movies. • User ratingsrange from zero to five stars. Zero stars indicate extreme dislike for a movie and five stars indicate high praise. • The content information for each movie was collectedfrom IMDb using a simple crawler. • The crawler followsthe IMDB link provided for every movie in the EachMoviedataset and collects information. • Content information of every movie is represented by a set of slots (features). • Each slot is representedsimply as a bag of words. • The slots used for the Each-Movie dataset are: movie title, director, cast, genre, plot
User-ratings Vector Training Examples Content-Based Predictor Pseudo User-ratings Vector User-rated Items Unrated Items Items with Predicted Ratings Content-Boosted CF - I
Content-Boosted CF - II User Ratings Matrix Pseudo User Ratings Matrix Content-Based Predictor • Compute pseudo user ratings matrix • Full matrix – approximates actual full user ratings matrix • Perform CF • Using Pearson corr. between pseudo user-rating vectors
Web Crawler Movie Content Database Full User Ratings Matrix User Ratings Matrix (Sparse) Content-based Predictor Collaborative Filtering Active User Ratings Recommendations Content-Boosted Collaborative Filtering EachMovie IMDb
PART 4: CONCLUSION