200 likes | 392 Views
Yoda: An Accurate and Scalable Web-based Recommendation Systems. Cyrus Shahabi, Farnoush Banaei-Kashani, Yi-Shin Chen, and Dennis McLeod Integrated Media Systems Center and Computer Science Department, University of Southern California E-mail:{shahabi, banaeika, yishinc, mcleod}@usc.edu.
E N D
Yoda: An Accurate and Scalable Web-based Recommendation Systems Cyrus Shahabi, Farnoush Banaei-Kashani, Yi-Shin Chen, and Dennis McLeod Integrated Media Systems Center and Computer Science Department, University of Southern California E-mail:{shahabi, banaeika, yishinc, mcleod}@usc.edu
Outline • Motivation • Related Work • Content-based Filtering • Collaborative Filtering • Offline Process: Clustering, Voting, Aggregation • Online Process: Classification & Aggregation • Performance Evaluation • Conclusion & Future Work
Motivation • The amount of data is enormous on the Web • Users suffer from information overload • Recommendation systems can personalize and customize the Web environment in real-time • Similar to Amazon.com “real-time” recommendations (people who bought this book also purchased …) • Different approach (vs. association-rule mining) • Challenges: • Scalability : As the # of items and users grow, the system stay efficient • Sparsity: Not enough information available on the user
Related Work: Content-Based Filtering • From the Information Retrieval community [Maes1994] [Shardanand and Maes 1995] [Balabanovi and Shoham 1997] • Based on a comparison between the feature vectors of items (e.g., artist, style) in the database and the user’s interest list • Major weakness [Balabanovi and Shoham 1997] • Content limitation: only can be applied to few kinds of content, can only capture certain aspects of the content • Over-specialization: users can only obtain information based on the content of their profiles
Related Work:Collaborative Filtering(CF) • Employ a user’s item evaluations (not the actual content) to find other similar users: nearest-neighbor algorithm [Resnick et al. 1994] Three major weaknesses • Scalability: time complexity O(U*I) (I:#items, U: #users) • Clustering [Breese et al. 2000] • Bayesian network [Kitts et al. 2000] • Sparsity: profile matrix (i.e., # of user evaluated items) is sparse • SVD [Sarwar et al. 2000] • Synonymy: latent association between items is not considered • Content analysis [Balabanovi and Shoham 1997] • Categorization [Kohrs and Merialdo 2000]
Clusters Item Database User Navigation Behaviors User 1 User 2 User 3 Fuzzy Aggregation User 4 User 5 User 6 User U-6 Cluster Wish-list User U-5 User U-4 0.87 0.83 User U-3 0.72 User U-2 0.61 User U-1 0.47 User U Offline Process PPED Similarity Measure and Clustering Favorite PVs (Rock= High Classical= Low Pop= Low Rap= High) Voting
Rock Classical Pop Rap Blues Property Values High Low MidHigh Low Voting 51 22 10 7 15 61 21 25 37 Cp,f(k) Rock Classical Blues H M L H M L H M L Mpf=Max{Cp,f(k)} f in F Voting Mechanism Favorite PVs (Rock= High Classical= Low Pop= Low Rap= High Blues= Low)
Locality Sensitive Hashing algorithm Property Values Item Database Rock Classical Pop Rap Blues Cluster Wish-List High Low MidMid Low 0.87 0.83 Fuzzy Aggregation 0.82 Favorite PVs (Rock= High Classical= Low Pop= Low Rap= High Blues= Low) 0.79 0.72 0.70 Fp(k) 0.68 (High*High) , (Mid*Low) 0.65 , (Low*Low) 0.63 0.61 0.54 0.47 0.42 Ranking Items fmax{ …} Vk(i)
Property Values Rock Classical Pop Rap Blues High Low MidMid Low Favorite PVs (Rock= High Classical= Low Pop= Low Rap= High Blues= Low) Mhigh(k) Fuzzy Aggregation f (High*High) , (Low*Mid) fmax{ } Optimized Equation • Why optimized: time complexity O(#P*I) (#P: # of properties, I: # of items) • Intuition: the vk(i) value comes from the maximum value among
Optimized Equation • Optimized Equation • Time complexity: O(f*I) I=#items f=#fuzzy terms • Satisfy a triangular norm form • Time complexity can be further reduced to O(N) (N: constant number) by Fagin’s A0 algorithm [Fagin 1996]
Clusters Cluster Wish-lists User Wish-List 0.87 0.87 0.87 0.83 0.83 0.83 0.72 0.72 0.72 PPED Similarity Measure 0.87 0.61 0.61 0.61 0.83 0.47 0.47 0.47 0.82 0.79 0.72 0.70 0.68 Fuzzy Aggregation 0.65 A List of Similarity Values 0.63 0.65 0.32 0.61 0.79 0.54 0.47 0.42 Online Process Current User’s Navigation Behavior
Optimized Method • Original Time complexity: O(K*I) K=#clusters I=#items • Time complexity of optimized method: • O(f*I) f=#fuzzy terms • Time complexity can be further reduced to O(N) (N: constant number) by Fagin’s A0 algorithm [Fagin 1996]
User Navigation Behaviors Clustering Item Database Clusters Similarity Matrix cluster user Ranking of Items in Clusters Cluster Favorite PVs Generate • Assign Property Values • to Items: • Item-PV = f(Cluster-PV, noise) • noise ~ item-rank Generate User Set Experimental Methodology
Item Database User Set Clusters Similarity Matrix User Navigation Behaviors cluster user H L M N F F L NF F L L M N F F L Ranking of Items in Clusters Cluster Favorite PVs • Assign evaluation values to items • Item-Rating = f(Cluster-Ranking, weight) • weight ~ user-cluster similarities M N F F L M MNF F L L M N F F L Experimental Methodology
Item Database User Set User Navigation Behaviors H L M N F F L NF F L L M N F F L M N F F L M MNF F L L M N F F L Experimental Methodology Training Testing Current Session Recommendation
0.45 1.1 1 0.4 0.9 0.35 0.8 0.3 0.7 0.25 0.6 Improvement Harmonic Mean 0.5 0.2 0.4 0.15 0.3 0.1 0.2 0.05 0.1 0 0 1000 5000 Number of Items Accuracy Comparison Nearest Neighbor Method Yoda Improvement
Yoda BNN: Basic Nearest Neighbor Method Processing Time Comparison Processing Time= CPU +IO In BNN process: #Items = 5000; #Users = 1000 In Yoda process: #Items in each cluster wish-list = 250 #Clusters = 18 2500 2000 1500 1000 CPU Time (milliseconds/user) 500 0 0 500 1000 1500 2000 2500 3000 3500 4000 4500 Number of Users
Conclusion • Yoda scales as the # of users/items grow • Higher accuracy Future Work • Compare other techniques • Run more experiments with real data • Incorporate the content-based filtering mechanism into the user clustering & classification phases • Incorporate the user profiles
Reference • [Shardanand and Maes 1995] U. Shardanand and P. Maes, Social Information Filtering: Algorithm for automating ''Word of Mouth'', proceedings on Human factors in computing systems, Denver,CO,USA , p. 210-217, May, 1995 • [Maes 1994] Pattie Maes, Agents that reduce work and information overload, Communications of the ACM, 37(7), p.30-40, 1994 • [Balabanovi and Shoham 1997]Marko Balabanovi and Yoav Shoham, Fab: content-based, collaborative recommendation, Communications of the ACM, 40(3), p. 66-72, 1997 • [Resnick et al. 1994] P. Resnick and N. Iacovou and M. Suchak and P. Bergstrom and J. Riedl, GroupLens: An Open Architecture for Collaborative Filtering of Netnews, Proceedings of ACM conference on Cumputer-Supported Cooperative Work, Chapel Hill, NC, p.175-186, 1994 • [Sarwar et al. 2000] B. Sarwar and G. Karypis and J. Konstan and J.Riedl, Application of Dimensionality Reduction in Recommender System -- A Case Study, ACM WebKDD 2000 Web Mining for E-Commerce Workshop, 2000 • [Kohrs and Merialdo 2000] A. Kohrs and B. Merialdo, Using category-based collaborative filtering in the Active WebMuseum, Proceedings of IEEE International Conference on Multimedia and Expo, 1, p.351-354, 2000
Reference • [Kitts et al. 2000] Brendan Kitts and David Freed and Martin Vrieze, Cross-sell: a fast promotion-tunable customer-item recommendation method based on conditionally independent probabilities, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, MA USA, p. 437-446, August, 2000 • [Breese et al. 2000] J. Breese and D. Heckerman and C. Kadie, Empirical Analysis of Predictive Algorithms for Collaborative Filtering, Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI USA, p.43-52, July, 1998 • Shahabi C., A.M. Zarkesh, J. Adibi, and V. Shah: Knowledge, Discovery from Users Web Page Navigation, Proceedings of the IEEE, RIDE97 Workshop, April, 1997. • Shahabi C., F. Banaei-Kashani, J. Faruque, and A. Faisal: Feature Matrices: A Model for Efficient and Anonymous Web Usage Mining , EC-Web 2001, Germany, September 2001 • Fagin R.: Combining Fuzzy Information from Multiple Systems, Proceedings of Fifteenth ACM Symposyum on Principles of Database Systems, Montreal, pp. 216-226, 1996. • Shahabi C., and Y. Chen: A Unified Framework to Incorporate Soft Query into Image Retrieval Systems , International Conference on Enterprise Information Systems, Setubal, Portugal, July 2001