170 likes | 330 Views
Google News Personalization Scalable Online Collaborative Filtering. Abhinandan Das - abhinandan@google.com Mayur Datar - mayur@google.com Ashutosh Garg - ashutosh@google.com Shyam Rajaram - rajaram1@ifp.uiuc.edu Presented by: Aniket Zamwar - zamwar@usc.edu. Already Studied in Class.
E N D
Google News PersonalizationScalable Online Collaborative Filtering • Abhinandan Das - abhinandan@google.com • Mayur Datar - mayur@google.com • Ashutosh Garg - ashutosh@google.com • Shyam Rajaram - rajaram1@ifp.uiuc.edu • Presented by: Aniket Zamwar - zamwar@usc.edu
Already Studied in Class • Map Reduce • Collaborative Filtering • Content Based Recommendation • Clustering Techniques - Pros/Cons
Problem Statement • Scale of operation is very huge - order of several million news stories dynamically changing at high rate. • Presented with the click history for N users ( U = { u1,u2,...,uN} ) over M items ( S = {s1,s2,...,sM} ), and given a specific user ‘u’ with click history set Cuconsisting of stories { si1 , . . . , si|Cu | }, recommend K stories to the user. • Strict timing constraints for recommendation engine to generate recommendations. 4/18/13 Google News Personalization
Approaches • Collaborative Clustering • Probabilistic Latent Semantic Indexing • Covisitation Counts 4/18/13 Google News Personalization
Problem Setting • Record User Queries and Clicks • Recommendations of News using user click history and click history of the community 4/18/13 Google News Personalization
Recommender System • Content based Systems • Collaborative Filtering Systems • Memory-based Algorithms • Prediction calculated as weighted average of the ratings given by other users • weight is proportional to to “similarity” between users. • Model-based Algorithms • Model the users based on their past ratings and use these models to predict ratings of unseen items. • Mix of memory based + model based systems 4/18/13 Google News Personalization
Algorithms • Model based approach • Clustering Techniques: Probabilistic Latent Semantic Indexing(PLSI) and min hash • Memory based approach • Item Covisitation 4/18/13 Google News Personalization
Min Hashing • Probabilistic clustering technique - assigns pairs of users to same cluster with probability proportional to overlab between the set of items the users have voted for. • Similarity calculated using Jaccard Coefficient • To Do: Given user u-i, compute similarity S(u-i, u-j) for all users u-j, and recommend stories to u-i voted by u-j with weight equal to S(u-i, u-j) • Issues: Real time not scalable, using hash table to find vote for specific user is also not feasible, offline computation is also not feasible • Locality Sensitive Hashing (LSH) comes for rescue 4/18/13 Google News Personalization
Locality Sensitive Hashing • Key Idea: Hash data points using several hash functions, such that for each hash function the probability of collision is much higher for objects which are close to each other. • Min-hashing technique is used to randomly permute the set of items (S) and for each user u-i compute its hash value h(u-i) as the index of first item under the permutation that belongs to user’s item set Cu-i. • Min-hashing = probabilistic clustering where each hash bucket corresponds to a cluster, that puts two users together in the same cluster with probability equal to item set similarity S(u-i, u-j) 4/18/13 Google News Personalization
PLSI • Probabilistic Latent Semantic models • Models users and items as random variables - relationship between users and items is learned by modeling joint distribution of users and items as mixed distribution • A hidden variable Z is used to define the relationship, it represents user communities and item communities. 4/18/13 Google News Personalization
Covisitation • Covisitation is defined as event in which two stories are clicked by same user within a certain time interval. • A graph whose nodes represent items and weighted edges represent time discounted number of covisitation instances. • For each user click the adjacency list representing graph is updated: for entry for each item in user history, new entry corresponding to clicked item is added if not there; if it is already there then the age discounted count is updated. 4/18/13 Google News Personalization
Covisitation based Recommendation • Fetch user u-i’s recent click history - limited to past few hours or days. • For each item s-i in click history of user, lookup the entry for pair (s-i, s) in adjacency list for s-i stored in Big Table. • The value stored in entry normalized by sum of all entries for s-i is stored to recommendation score. • Recommendation score is normalized to a value between 0 and 1 by linear scaling. 4/18/13 Google News Personalization
System Setup • Three Components: • Offline component to cluster users based on click history • Online servers: • Updating story and user statistics each time user clicks on news story • Generating news story when user requests • Two Data Tables • User Table (UT) indexed by user-id, stores user click history and clustering information. • Story Table (ST) indexed by story-id, stores real time click counts for every story-story and story-cluster pair. 4/18/13 Google News Personalization
System Components NSS Min Hashing PLSI Clusters + Click History b u f f e r UserID + Clicked Story UT UserId + Clicked Story User Clusters user click NFE Offline Log Analysis User Click Histories UserID + Candidate Stories Clusters + Click History view personalized news page request Update Statistics ST NPS Ranked Stories Fetch Statistics c a c h e NFE: News Front End NSS: News Statistics Server NPS: News Personalization Server UT: User Table ST: Story Table 4/18/13 Google News Personalization
Pros • Scalable collaboration of Content based and Collaborative clustering • Recommendation system using Min Hashing and PLSI Algorithms • Scaling the algorithms by using Map Reduce and Big Table representation for data • Using click events as vote for news • Dynamically providing the latest likely news that suits the interest of the user 4/18/13 Google News Personalization
Cons • Depends a lot on User Clicks • User Clicks considered as positive vote • Does not say anything about negative vote 4/18/13 Google News Personalization