1 / 32

Item Based Collaborative Filtering Recommendation Algorithms

Item Based Collaborative Filtering Recommendation Algorithms. Week 8. Introduction. Recommender Systems – Apply knowledge discovery techniques to the problem of making personalized recommendations for information, products or services, usually during a live interaction

dai-herring
Download Presentation

Item Based Collaborative Filtering Recommendation Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Item Based Collaborative Filtering Recommendation Algorithms Week 8

  2. Introduction • Recommender Systems – Apply knowledge discovery techniques to the problem of making personalized recommendations for information, products or services, usually during a live interaction • Collaborative Filtering – Builds a database of users’ preference for items. Thus, the recommendation can be made based on the neighbors who have similar tastes

  3. Collaborative Filtering in our life

  4. Collaborative Filtering in our life

  5. Collaborative Filtering in our life

  6. Motivation of Collaborative Filtering • Need to develop multiple products that meet the multiple needs of multiple consumers • Recommender systems used by E-commerce • Multimedia recommendation • Personal tastesmatters

  7. Users, Items, Preferences • Terminology • Users interact with items (books, videos, news, other users,…) • Preferences of each user towards a small subset of the items known (numeric or boolean)

  8. Basic Strategies • Predict and Recommend • Predictthe opinion:how likely that the user will have on the this item • Recommend the ‘best’ items based on • the user’s previous likings, and • the opinions of like-minded users whose ratings are similar

  9. Explicit and Implicit Ratings • Where do the preference come from? • Explicit Ratings • Users explicitly express their preferences (e.g. ratings with stars) • Willingness of the users required • Implicit Ratings • Interactions with items are interpreted as expressions of preference (e.g. purchasing a book, reading a news article) • Interactions must be detectable

  10. Collaborative Filtering • Mathematically • User-item-matrix is created from the preference data • Task is to predict missing entries by finding patterns in the known entries

  11. A Sample User-Item-Matrix

  12. Traditional Collaborative Filtering • Nearest-Neighbor CF algorithm (KNN) • Cosine distance • For N-dimensional vector of items, measure two customers A and B

  13. Traditional Collaborative Filtering • If we have M customers, the complexity will be O(MN) • Reduce M by randomly sampling the customers • Reduce N by discarding very popular or unpopular items • Can be O(M+N), but …

  14. Clustering Techniques • Work by identifying groups of consumers who appear to have similar preferences • Performance can be good with smaller size of group • May hurt accuracy while dividing the population into clusters But…

  15. How about aContent based Method? • Given the user’s purchased and rated items, constructs a search query to find other popular items • For example, same author, artist, director, or similar keywords/subjects • Impractical to base a query on all the items But…

  16. User-Based Collaborative Filtering • Algorithms we looked into so far • 2 challenges: • Scalability: Complexity grows linearly with the number of customers and items • Sparsity: The sparsity of recommendations on the data set • Even active customers may have purchased well under 1% of the total products

  17. New Approaches?

  18. Item-to-Item Collaborative Filtering • No more matching the user to similar customers • build a similar-items table by finding that customers tend to purchase together • Amazon.com used this method • Scales independently of the catalog size or the total number of customers • Acceptable performance by creating the expensive similar-item table offline

  19. Item-to-Item CF Algorithm • O(N^M)

  20. Item-to-Item CF AlgorithmSimilarity Calculation Computed by looking into co-rated items only. These co-rated pairs are obtained from different users.

  21. Item-to-Item CF AlgorithmSimilarity Calculation • For similarity between two items i and j,

  22. Item-to-Item CF AlgorithmPrediction Computation • Recommend items with high-ranking based on similarity

  23. Item-to-Item CF AlgorithmPrediction Computation • Weighted Sum to capture how the active user rates the similar items • Regression to avoid misleading in the sense that two rating vectors may be distant yet may have very high similarities

  24. The item-item scheme provides better quality of predictions than the user-user scheme • Higher training/test ratio improves the quality, but not very large • The item neighborhood is fairly static, which can be pre-computed • Improve the online performance

  25. Distributed Item-based CF

  26. Algorithm in Map/Reduce • How can we compute the similarities efficiently with Map/Reduce? • Key ideas • We can ignore pairs of items without a co-occurring rating • We need to see all co-occurring ratings for each pair of items in the end • Inspired by an algorithm designed to compute the pairwise similarity of text documents

  27. Implementations in Mahout • ItemSimilarityJob • Computes all item similarities • Various configuration options: • Similarity measure to use (e.g. cosine, Pearson-Correlation, Tanimoto-Coefficient, your own implementations) • Maximum number of similar items per item • Maximum number of cooccurrences considered • Input: preference data as CSV file, each line represents a single preference in the form of userID, itemID, value • Output: pairs of itemIDs with their associated similarity value

  28. Implementations in Mahout • RecommenderJob • Distributed ItemBased Recommender • Various configuration options” • Similarity measure to use • Number of recommendations per user • Filter out some users or items • Input: preference data as CSV file, each line represents a single preference in the form of userID, itemID, value • Output: userIDs with associated recommended itemIDs and their scores

More Related