1 / 39

Recommender Systems Session I

Recommender Systems Session I. Robin Burke DePaul University Chicago, IL. Roadmap. Session A: Basic Techniques I Introduction Knowledge Sources Recommendation Types Collaborative Recommendation Session B: Basic Techniques II Content-based Recommendation Knowledge-based Recommendation

mirma
Download Presentation

Recommender Systems Session I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recommender SystemsSession I Robin Burke DePaul University Chicago, IL

  2. Roadmap Session A: Basic Techniques I Introduction Knowledge Sources Recommendation Types Collaborative Recommendation Session B: Basic Techniques II Content-based Recommendation Knowledge-based Recommendation Session C: Domains and Implementation I Recommendation domains Example Implementation Lab I Session D: Evaluation I Evaluation Session E: Applications User Interaction Web Personalization Session F: Implementation II Lab II Session G: Hybrid Recommendation Session H: Robustness Session I: Advanced Topics Dynamics Beyond accuracy

  3. Current research • Question 1 • do we lose something when we think of a ratings database as static? • my work • Question 2 • does a summary statistic like MAE hide valuable information? • Mike O’Mahoney (UCD colleague)

  4. Collaborative Dynamics • Remember our evaluation methodology • get all the ratings • divide them up into test / training data sets • run prediction tests

  5. Problem • That isn’t how real recommender systems operate • They get a stream of ratings over time • They have to respond to user requests • predictions • recommendation lists • dynamically

  6. Questions • Are early ratings more predictive than later ratings? • Is there a pattern to how users build their profiles? • How long does it take to get past the cold-start?

  7. Some ideas • Temporal leave-one-out • Profile MAE • Profile Hit Ratio

  8. Temporal leave-one-out (TL1O) • for a rating r(u,i) at time t • predict that r(u,i) using the ratings database immediately prior to t • the information that would have been available right before we learned u’s real rating • Average the error over time intervals • we see how error evolves as data is added • cold-start in action

  9. Profile MAE • For each profile • do the TL1O ratings • average over all profiles of that length • See the aggregate evolution of profiles

  10. Profile Hit Ratio • Do a similar thing for hit ratio • For each liked item r(u,i) > 3 at time t • create a recommendation list at time t • measure the rank of item i on that list • compute the hit ratio of such items on lists of length k

  11. Temporal MAE (ML1M)

  12. Cold Start • Seems to take about 150 days to get past the initial cold start • about 15% of the data • Temporal MAE improves after that • but not as steeply

  13. Profile MAE • Decrease in MAE as profiles get longer • Strongest decrease earlier in the curve • Seems to be a kNN property • same thing happens if the first 150

  14. Diminishing returns • Appears to be diminishing returns in longer profile sizes • paradoxical given what we know about sparsity • More data should be better

  15. A clue • ML100K data • 10% data size • Sparser data compresses the curve • Diminishing returns may be a function of the average profile length

  16. Average rating • Users seem to add positive ratings first and negative ratings later

  17. Application-dependence • Could be because ratings are added in response to recommendations • Easy (popular) recommendations given first • likely to be right • Later recommendations • more errors • users rate lower

  18. Profile Hit Ratio • Cumulative hit ratio • n=50 • Dashed line is random performance

  19. Interestingly • Harder to see • Appear to be diminishing returns • like MAE • but then a jump at the end • Need to examine this data more • ML100K data • experiments very slow to run

  20. MAE for different ratings • Odd result • MAE for each rating value • correlated with # of ratings of that value in the profile • subtract out contribution of total # of ratings of that value • May tell us the average value of adding a rating of a particular type • Look at R=5? • saturation • more about this later

  21. Break

  22. What Have The Neighbours Ever Done for Us? A Collaborative Filtering Perspective. Michael O’Mahony 5th March, 2009

  23. Presentation based on paper submitted to UMAP ’09 • Authors: • R. Rafter, M.P. O’Mahony, N. J. Hurley and B. Smyth

  24. Collaborative Filtering • Collaborative filtering (CF) – key techniques used in recommender systems • Harnesses past ratings to make predictions & recommendations for new items • Recommend items with high predicted ratings and suppress those with low predicted ratings • Assumption: CF techniques provide a considerable advantage over simpler average-rating approaches

  25. Valid Assumption? • We analyse the following: • What do CF techniques actually contribute? • How is accuracy performance measured? • What datasets are used to evaluate CF techniques? • Consider two standard CF techniques: • User-based and item-based CF

  26. CF Algorithms • Two components to user-based and item-based CF: • Initial estimate: based on average rating of target user or item • Neighbour estimate: based on ratings of similar users or items • Must perturb the initial estimate: • By the correct magnitude • In the correct direction • General formula:

  27. CF Algorithms • User-based CF: • Item-based CF: Neighbour Estimate Initial Estimate

  28. Evaluating Accuracy • Predictive accuracy: • Mean Absolute Error (MAE): • MAE calculated over all test set ratings (problem?) • Other metrics: RMSE, ROC curves … – give similar trends

  29. Evaluation • Datasets: • Procedure: • Create test set by randomly removing 10% of ratings • Make predictions for test set ratings using remaining data • Repeat x10 and compute average MAE

  30. Results • Average performance, computed over all test set ratings • Neighbour estimate magnitudes are small, between 8.5% – 11% of range • Item-based CF is comparable to/outperforms user-based CF wrt MAE • (smaller magnitudes observed for item-based CF) • Book-crossing dataset – user-based CF shifts initial estimate in correct direction • in only 53% of cases (just slightly better than chance!)

  31. Neighbour Magnitude

  32. Datasets • Frequency of occurrence of ratings: • Bias (natural?) toward ratings onhigher end of scale • Consider MovieLens: • Most ratings are 3 and 4 • Mean user rating ≈ 3.6 –– small neighbour estimate magnitude required in most cases • Consequences of such datasets characteristics for CF research: • Computing average MAE across all test set ratings hide performance issues in light of such characteristics [Shardanand and Maes 1995] • For example, can CF achieve large magnitudes when needed?

  33. MAE vs Actual Ratings Recall: average overall MAE = 0.73 for both UB and IB …

  34. Error PDFs

  35. NeighbourContribution • Effect of neighbour estimate versus initial (mean-based) estimate:

  36. Neighbour Contribution

  37. Conclusions • Examined the contribution of standard CF techniques: • Neighbours have small influence (magnitude) which is not always reliable (direction) • Evaluating accuracy performance: • Need for more fine-grained error analysis [Shardanand and Maes 1995] • Focus on developing CF algorithms which offer improved accuracy performance for extreme ratings • Test datasets: • Standard datasets have particular characteristics – e.g. bias in ratings toward higher end of rating scale – need for new datasets • Such characteristics, combined with using overall MAE to evaluate accuracy, has “hidden” performance issues – and hindered CF development (?)

  38. That’s all folks! • Questions?

More Related