MRI: Meaningful Interpretations of Collaborative Ratings

MRI: Meaningful Interpretations of Collaborative Ratings Mahashweta Das Sihem Amer-Yahia Cong Yu Gautam Das 37th International Conference on Very Large Data Bases, 2011 @ Seattle

Roadmap • Introduction • Motivation • Problem: MRI • Sub problem: DEM • Sub problem: DIM • Data Model • Algorithms • Experiments • Quantitative • Qualitative • Conclusion & Future Work

Motivation

Motivation • Examining reviews vs. trusting overall aggregate rating • IMDB ratings demographic breakdown not meaningful enough

MRI Problem • Examining reviews vs. trusting overall aggregate rating • IMDB ratings demographic breakdown not meaningful enough • Novel and powerful third option: Meaningful Rating Interpretation • Explain ratings by leveraging user and item attribute information

MRI Problem • Examining reviews vs. trusting overall aggregate rating • IMDB ratings demographic breakdown not meaningful enough • Novel and powerful third option: Meaningful Rating Interpretation • Explain ratings by leveraging user and item attribute information • Example:

MRI Sub-problem • DEM: Meaningful Description Mining • Identify groups of reviewers who consistently share similar ratings on items

MRI Sub-problem • DIM: Meaningful Difference Mining • Identify groups of reviewers who consistently disagree on item ratings

Data Model • Collaborative rating site: <Set of Items, Set of Users, Ratings> • Rating tuple: <item attributes, user attributes, rating> • Group: Set of ratings describable by a set of attribute values • Notion of groupbased on data cube • OLAP literature for mining multidimensional data

Data Model • Notion of group based on data cube lattice Each node in lattice is a data cube/cuboid Query condition on database Figure: 4-Dimensional Data Cube Lattice

Data Model • Notion of group based on data cube lattice Each node in lattice is a data cube/cuboid Query condition on database A = Gender B = Age C = Location D = Occupation Figure: 4-Dimensional Data Cube Lattice

Data Model Each node/data cube/ cuboid in lattice is a group Selection Query Condition A = Gender: Male B = Age: Young C = Location: CA D = Occupation: Student Figure: Partial Rating Lattice for a Movie (M:Male, Y:Young, CA:California, S:Student)

Data Model Task Quickly indentify “good” groups in the lattice that help users understand ratings effectively Figure: Partial Rating Lattice for a Movie (M:Male, Y:Young, CA:California, S:Student)

DEM: Meaningful Description Mining • For an input item covering RI ratings, return set C of cuboids, such that: • description error is minimized, subject to: • |C| ≤ k; • coverage ≥ a Description Error Measures how well a cuboid average rating approximates the numerical score of each individual rating belonging to it Coverage Measures the percentage of ratings covered by the returned cuboids • DEM is NP-Hard: Proof details in paper

DEM Algorithms • Exact Algorithm (E-DEM) • Brute-force enumerating all possible combinations of cuboids in lattice to return the exact (i.e., optimal) set as rating descriptions • Random Restart Hill Climbing Algorithm • Often fails to satisfy Coverage constraint; Large number of restarts required • Need an algorithm that optimizes both Coverage and Description Error constraints simultaneously • Randomized Hill Exploration Algorithm (RHE-DEM)

RHE-DEM Algorithm Satisfy Coverage Minimize Error C= {Male, Student} {California, Student} Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student)

RHE-DEM Algorithm Satisfy Coverage Minimize Error C= {Male, Student} {California, Student} Say,C does not satisfy Coverage Constraint Figure: Partial Rating Lattice for aMovie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student)

RHE-DEM Algorithm Satisfy Coverage Minimize Error C= {Male, Student} {California, Student} C= {Male} {California,Student} C= {Student} {California,Student} Figure: Partial Rating Lattice for aMovie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student)

RHE-DEM Algorithm Satisfy Coverage Minimize Error √ C= {Male} {California, Student} Say, C satisfies Coverage Constraint Figure: Partial Rating Lattice for aMovie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student)

RHE-DEM Algorithm Satisfy Coverage Minimize Error √ C= {Male} {California, Student} Figure: Partial Rating Lattice for aMovie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student)

RHE-DEM Algorithm Satisfy Coverage Minimize Error √ √ C= {Male} {Student} Figure: Partial Rating Lattice for aMovie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student)

DIM: Meaningful Difference Mining • For an input item covering RI+RI- ratings, return set C of cuboids, such that: • difference balance is minimized, subject to: • |C| ≤ k; • ≥ a ∩ ≥ a Difference Balance Measures whether the positive and negative ratings are “mingled together" (high balance) or “separated apart" (low balance) Coverage Measures the percentage of +, - ratings covered by the returned cuboids • DIM is NP-Hard: Proof details in paper

DIM Algorithms • Exact Algorithm (E-DIM) • Randomized Hill Exploration Algorithm (RHE-DIM) • Unlike DEM “error”, DIM “balance” computation is expensive • Quadratic computation scanning all possible positive and negative ratings for each set of cuboids • Introduce the concept of Fundamental Regions to aid faster balance computation • Partition space of all ratings and aggregate rating tuples in each region

DIM Algorithms: Fundamental Region C1 = {Male, Student} C2 = {California, Student} Balance = Figure: Computing Balance using Fundamental Region Set of k=2 cuboids having 75 ratings (44+, 31-),10 ratings (6+, 4-)

Experiments • Dataset • MovieLens:100,000 ratings for 1682 movies by 943 users • Each user has 4 attributes: Gender, Age, Occupation, Location • Binning the movies: Order movies according to number of ratings and then partition into 6 bins • Bin 1: movies with fewest ratings, Bin 6: movies with highest ratings • Evaluation • Quantitative Indicator: Efficiency, Quality and Scalability • Qualitative Indicator: Mechanical Turk User Study

Quantitative Experiments: DEM

Qualitative Experiments: User Study • Amazon Mechanical Turk study • Two sets: one for description mining, one for difference mining • Each set: 4 randomly chosen movies, 30 independent single- user tasks • Study 1: Users prefer simple aggregate ratings over rating interpretations • Study 2: Users prefer rating interpretations by exact algorithm or heuristic randomized hill exploration algorithm

Qualitative Experiments: User Study

Conclusion and Future Work • Novel problem of meaningful rating interpretation (MRI) in collaborative rating sites • Meaningful Description Mining • Meaningful Difference Mining • Heuristic algorithmic solutions that generate equally good rating interpretations as exact brute-force with much less execution time • Meaningful interpretations of ratings by reviewers of interest • Additional constraints such as diversity of rating explanations

Related Work • Data Cubes • Gray et. al, A relational aggregation operator generalizing group-by, cross-tab, and sub-totals, ICDE 1996 • Sathe et. al, Intelligent rollups in multidimensional olap data, VLDB 2001 • Lakshmanan et. al, Quotient cube: how to summarize the semantics of a data cube, VLDB 2002 • Ramakrishnan et. al, Exploratory mining in cube space, ICDM 2006 • Wu et. al, Promotion analysis in multi-dimensional space, VLDB 2009 • Clustering & Dimensionality Reduction • Agrawal et. al, Automatic subspace clustering of high dimensional data for data mining applications, SIGMOD 1998 • Recommendation Explanation • Herlocker et. al, Explaining collaborative filtering recommendations, CSCW 2000 • Bilgic et. al, Explaining recommendations: Satisfaction vs. promotion, IUI 2005

Thank You Questions

Quantitative Experiments: DIM

Quantitative Experiments: DEM, DIM

MRI: Meaningful Interpretations of Collaborative Ratings