1 / 46

MRI: Meaningful Interpretations of Collaborative Ratings

MRI: Meaningful Interpretations of Collaborative Ratings. Mahashweta Das Sihem Amer-Yahia Cong Yu Gautam Das.  37th International Conference on Very Large Data Bases, 2011 @ Seattle. Roadmap. Introduction Motivation Problem: MRI Sub problem: DEM Sub problem: DIM Data Model

kylev
Download Presentation

MRI: Meaningful Interpretations of Collaborative Ratings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MRI: Meaningful Interpretations of Collaborative Ratings Mahashweta Das Sihem Amer-Yahia Cong Yu Gautam Das  37th International Conference on Very Large Data Bases, 2011 @ Seattle

  2. Roadmap • Introduction • Motivation • Problem: MRI • Sub problem: DEM • Sub problem: DIM • Data Model • Algorithms • Experiments • Quantitative • Qualitative • Conclusion & Future Work

  3. Roadmap • Introduction • Motivation • Problem: MRI • Sub problem: DEM • Sub problem: DIM • Data Model • Algorithms • Experiments • Quantitative • Qualitative • Conclusion & Future Work

  4. Motivation

  5. Motivation

  6. Motivation

  7. Motivation • Examining reviews vs. trusting overall aggregate rating • IMDB ratings demographic breakdown not meaningful enough

  8. MRI Problem • Examining reviews vs. trusting overall aggregate rating • IMDB ratings demographic breakdown not meaningful enough • Novel and powerful third option: Meaningful Rating Interpretation • Explain ratings by leveraging user and item attribute information

  9. MRI Problem • Examining reviews vs. trusting overall aggregate rating • IMDB ratings demographic breakdown not meaningful enough • Novel and powerful third option: Meaningful Rating Interpretation • Explain ratings by leveraging user and item attribute information • Example:

  10. MRI Problem • Examining reviews vs. trusting overall aggregate rating • IMDB ratings demographic breakdown not meaningful enough • Novel and powerful third option: Meaningful Rating Interpretation • Explain ratings by leveraging user and item attribute information • Example:

  11. MRI Sub-problem • DEM: Meaningful Description Mining • Identify groups of reviewers who consistently share similar ratings on items

  12. MRI Sub-problem • DEM: Meaningful Description Mining • Identify groups of reviewers who consistently share similar ratings on items

  13. MRI Sub-problem • DIM: Meaningful Difference Mining • Identify groups of reviewers who consistently disagree on item ratings

  14. MRI Sub-problem • DIM: Meaningful Difference Mining • Identify groups of reviewers who consistently disagree on item ratings

  15. Roadmap • Introduction • Motivation • Problem: MRI • Sub problem: DEM • Sub problem: DIM • Data Model • Algorithms • Experiments • Quantitative • Qualitative • Conclusion & Future Work

  16. Data Model • Collaborative rating site: <Set of Items, Set of Users, Ratings> • Rating tuple: <item attributes, user attributes, rating> • Group: Set of ratings describable by a set of attribute values • Notion of groupbased on data cube • OLAP literature for mining multidimensional data

  17. Data Model • Notion of group based on data cube lattice Each node in lattice is a data cube/cuboid Query condition on database Figure: 4-Dimensional Data Cube Lattice

  18. Data Model • Notion of group based on data cube lattice Each node in lattice is a data cube/cuboid Query condition on database A = Gender B = Age C = Location D = Occupation Figure: 4-Dimensional Data Cube Lattice

  19. Data Model Each node/data cube/ cuboid in lattice is a group Selection Query Condition A = Gender: Male B = Age: Young C = Location: CA D = Occupation: Student Figure: Partial Rating Lattice for a Movie (M:Male, Y:Young, CA:California, S:Student)

  20. Data Model Each node/data cube/ cuboid in lattice is a group Selection Query Condition A = Gender: Male B = Age: Young C = Location: CA D = Occupation: Student Figure: Partial Rating Lattice for a Movie (M:Male, Y:Young, CA:California, S:Student)

  21. Data Model Task Quickly indentify “good” groups in the lattice that help users understand ratings effectively Figure: Partial Rating Lattice for a Movie (M:Male, Y:Young, CA:California, S:Student)

  22. Roadmap • Introduction • Motivation • Problem: MRI • Sub problem: DEM • Sub problem: DIM • Data Model • Algorithms • Experiments • Quantitative • Qualitative • Conclusion & Future Work

  23. DEM: Meaningful Description Mining • For an input item covering RI ratings, return set C of cuboids, such that: • description error is minimized, subject to: • |C| ≤ k; • coverage ≥ a Description Error Measures how well a cuboid average rating approximates the numerical score of each individual rating belonging to it Coverage Measures the percentage of ratings covered by the returned cuboids • DEM is NP-Hard: Proof details in paper

  24. DEM Algorithms • Exact Algorithm (E-DEM) • Brute-force enumerating all possible combinations of cuboids in lattice to return the exact (i.e., optimal) set as rating descriptions • Random Restart Hill Climbing Algorithm • Often fails to satisfy Coverage constraint; Large number of restarts required • Need an algorithm that optimizes both Coverage and Description Error constraints simultaneously • Randomized Hill Exploration Algorithm (RHE-DEM)

  25. RHE-DEM Algorithm Satisfy Coverage Minimize Error C= {Male, Student} {California, Student} Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student)

  26. RHE-DEM Algorithm Satisfy Coverage Minimize Error C= {Male, Student} {California, Student} Say,C does not satisfy Coverage Constraint Figure: Partial Rating Lattice for aMovie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student)

  27. RHE-DEM Algorithm Satisfy Coverage Minimize Error C= {Male, Student} {California, Student} C= {Male} {California,Student} C= {Student} {California,Student} Figure: Partial Rating Lattice for aMovie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student)

  28. RHE-DEM Algorithm Satisfy Coverage Minimize Error √ C= {Male} {California, Student} Say, C satisfies Coverage Constraint Figure: Partial Rating Lattice for aMovie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student)

  29. RHE-DEM Algorithm Satisfy Coverage Minimize Error √ C= {Male} {California, Student} Figure: Partial Rating Lattice for aMovie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student)

  30. RHE-DEM Algorithm Satisfy Coverage Minimize Error √ C= {Male} {California, Student} Figure: Partial Rating Lattice for aMovie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student)

  31. RHE-DEM Algorithm Satisfy Coverage Minimize Error √ √ C= {Male} {Student} Figure: Partial Rating Lattice for aMovie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student)

  32. DIM: Meaningful Difference Mining • For an input item covering RI+RI- ratings, return set C of cuboids, such that: • difference balance is minimized, subject to: • |C| ≤ k; • ≥ a ∩ ≥ a Difference Balance Measures whether the positive and negative ratings are “mingled together" (high balance) or “separated apart" (low balance) Coverage Measures the percentage of +, - ratings covered by the returned cuboids • DIM is NP-Hard: Proof details in paper

  33. DIM Algorithms • Exact Algorithm (E-DIM) • Randomized Hill Exploration Algorithm (RHE-DIM) • Unlike DEM “error”, DIM “balance” computation is expensive • Quadratic computation scanning all possible positive and negative ratings for each set of cuboids • Introduce the concept of Fundamental Regions to aid faster balance computation • Partition space of all ratings and aggregate rating tuples in each region

  34. DIM Algorithms: Fundamental Region C1 = {Male, Student} C2 = {California, Student} Balance = Figure: Computing Balance using Fundamental Region Set of k=2 cuboids having 75 ratings (44+, 31-),10 ratings (6+, 4-)

  35. Roadmap • Introduction • Motivation • Problem: MRI • Sub problem: DEM • Sub problem: DIM • Data Model • Algorithms • Experiments • Quantitative • Qualitative • Conclusion & Future Work

  36. Experiments • Dataset • MovieLens:100,000 ratings for 1682 movies by 943 users • Each user has 4 attributes: Gender, Age, Occupation, Location • Binning the movies: Order movies according to number of ratings and then partition into 6 bins • Bin 1: movies with fewest ratings, Bin 6: movies with highest ratings • Evaluation • Quantitative Indicator: Efficiency, Quality and Scalability • Qualitative Indicator: Mechanical Turk User Study

  37. Quantitative Experiments: DEM

  38. Quantitative Experiments: DEM

  39. Qualitative Experiments: User Study • Amazon Mechanical Turk study • Two sets: one for description mining, one for difference mining • Each set: 4 randomly chosen movies, 30 independent single- user tasks • Study 1: Users prefer simple aggregate ratings over rating interpretations • Study 2: Users prefer rating interpretations by exact algorithm or heuristic randomized hill exploration algorithm

  40. Qualitative Experiments: User Study

  41. Roadmap • Introduction • Motivation • Problem: MRI • Sub problem: DEM • Sub problem: DIM • Data Model • Algorithms • Experiments • Quantitative • Qualitative • Conclusion & Future Work

  42. Conclusion and Future Work • Novel problem of meaningful rating interpretation (MRI) in collaborative rating sites • Meaningful Description Mining • Meaningful Difference Mining • Heuristic algorithmic solutions that generate equally good rating interpretations as exact brute-force with much less execution time • Meaningful interpretations of ratings by reviewers of interest • Additional constraints such as diversity of rating explanations

  43. Related Work • Data Cubes • Gray et. al, A relational aggregation operator generalizing group-by, cross-tab, and sub-totals, ICDE 1996 • Sathe et. al, Intelligent rollups in multidimensional olap data, VLDB 2001 • Lakshmanan et. al, Quotient cube: how to summarize the semantics of a data cube, VLDB 2002 • Ramakrishnan et. al, Exploratory mining in cube space, ICDM 2006 • Wu et. al, Promotion analysis in multi-dimensional space, VLDB 2009 • Clustering & Dimensionality Reduction • Agrawal et. al, Automatic subspace clustering of high dimensional data for data mining applications, SIGMOD 1998 • Recommendation Explanation • Herlocker et. al, Explaining collaborative filtering recommendations, CSCW 2000 • Bilgic et. al, Explaining recommendations: Satisfaction vs. promotion, IUI 2005

  44. Thank You Questions

  45. Quantitative Experiments: DIM

  46. Quantitative Experiments: DEM, DIM

More Related