300 likes | 497 Views
EigenTaste: A Constant Time Collaborative Filtering Algorithm. Ken Goldberg Students: Theresa Roeder, Dhruv Gupta, Chris Perkins Industrial Engineering and Operations Research Electrical Engineering and Computer Science UC Berkeley. CF Problem Definition.
E N D
EigenTaste:A Constant Time Collaborative Filtering Algorithm Ken Goldberg Students: Theresa Roeder, Dhruv Gupta, Chris Perkins Industrial Engineering and Operations Research Electrical Engineering and Computer Science UC Berkeley
CF Problem Definition • A set of objects (movies, books, jokes) • A user rates a subset of objects • Based on the ratings, retrieve objects from the complement of this subset. Criteria: • Effective : recommended objects should receive high ratings • Efficient : the online recommendation process should run quickly and be scalable
Some Previous Work • D. Goldberg, et al. - Tapestry (1992) • Riedel, Resnick, Konstan et. al. - GroupLens(1994-) • Shardanand and Maes - Ringo (1995) • Resnick and Varian (1997) • Breese et. al. at Microsoft Research (1998) • Pazzani (1999) • Herlocker et. al. - GroupLens (1999)
WWW-based Recommender Systems MovieLens Firefly MovieCritic
EigenTaste Algorithm 1) Principal Component Analysis 2) Universal Queries (dense ratings matrix) 3) Fine-grained ratings bar (captures nuances) 4) Offline and Online Processing 5) Online: Constant time recommendations
Universal Queries • Most CF systems require users to select which items they want to rate: sparse ratings matrix • Eigentaste allows users to rate all items based on short unbiased descriptions (eg, film synopsis) • Eigentaste uses a subset of highly discriminatory items for the gauge set
Continuous Rating Scale Disapprove Approve
EigenTaste Algorithm • A is the n x m normalized rating matrix • n users • m objects • C is the k x k reduced correlation matrix • k objects in the gauge set: • C = (1/n) ATA • assumes ratings are continuous with linear rel. • E is the ortho. matrix of eigenvectors of C • is the diagonal matrix of eigenvalues
EigenTaste • ECET = • C = ETE • Let B = AET • RB = (1/n) BTB = ECET = • transformed points are uncorrelated and each column of B has variancei • Principle Components (Pearson 1901) • consider m largest eigenvectors, Em • Bm = AEmT • choose m based on “knee” in eigenvalues
Dimensionality Reduction • First two principal components (eigenvectors) account for nearly 50% of the variation in user ratings • Project user ratings along first two principal components: x = AE2T • Facilitates visualization ...
Eigen Plane Recursive Clustering
The EigenTaste Algorithm • Offline: • Compute eigenvectors and project users onto eigen plane. • Cluster and compute average ratings for each cluster. • Online: • Collect ratings for objects in gauge set • Project onto the eigen plane • Find representative cluster • Recommend objects based on average ratings within that cluster
First Application (1999)Jester: Recommending Jokes • Sense of humor is difficult to specify • Advantages: • Rating process is not altogether unpleasant • Can evaluate jokes quickly: • Dense ratings matrix (large sample size) • Disadvantages: • Offensive/Shaggy Dog jokes • Temporal Effects, Portfolio Effects • Priming/Masking
System Architecture Login Interface CGI Web Server Recommendation Engine CGI Client Content Database User Rating Profiles Internet
Measure of Effectiveness Metric: Normalized Mean Absolute Error (NMAE): Average absolute deviation of actual ratings from predicted ratings, normalized over rating range. MAE = 1/c |r - p| NMAE = MAE / (r_max - r_min)
Based on 18,000 users Effectiveness
Computational Complexity n - number of users k - number of objects in gauge set Nearest Neighborhood algorithm : Online processing - O(kn) EigenTaste algorithm: Offline processing - O(k2n) Online processing - O(k)
Prediction Speed Time to Algorithm process 9000 users 28 hours Nearest Neighbor EigenTaste 3 minutes
Current Jester Dataset 62,000 registered users approx. 3,000,000 ratings
Second Application (2000) Sleeper: Recommending Books
EigenTaste Algorithm 1) Principal Component Analysis 2) Universal Queries (dense ratings matrix) 3) Fine-grained ratings bar (captures nuances) 4) Offline and Online Processing 5) Online: Constant time recommendations • Patent application • 21 December 1999 by UC Regents
www.cs.berkeley.edu/~goldberg goldberg@cs.berkeley.edu Eigentaste: A Constant Time Collaborative Filtering Algorithm (to appear: Information Retrieval Journal, 2001)