1 / 16

A clustering algorithm to find groups with homogeneous preferences

A clustering algorithm to find groups with homogeneous preferences. J. Díez, J.J. del Coz, O. Luaces, A. Bahamonde. Centro de Inteligencia Artificial. Universidad de Oviedo at Gijón www.aic.uniovi.es Workshop on Implicit Measures of User Interests and Preferences.

Download Presentation

A clustering algorithm to find groups with homogeneous preferences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A clustering algorithm to find groups with homogeneous preferences J. Díez, J.J. del Coz, O. Luaces, A. Bahamonde Centro de Inteligencia Artificial. Universidad de Oviedo at Gijónwww.aic.uniovi.es Workshop on Implicit Measures of User Interests and Preferences

  2. The framework to learn preferences People tend to rate their preferences in a relative way Which middle circle do you think is larger?

  3. Me The framework to learn people’s preferences • Regression is not a good idea • We will use training sets of preference judgments • pairs of vectors (v, u) where someone expresses that he or she prefers v to u SVMlinear {vi > ui: i  IMe } fMe fMe is a linear ranking function: f(vi) > f(ui) whenever vi is preferable to ui

  4. Me The framework to learn people’s preferences SVMlinear {vi > ui: i  IMe } fMe • How useful is this ranking functionfMe? • Accuracy, generalization error • # Training examples • # Attributes • reliable, general, …

  5. {vi1 > ui1} {vi4 > ui4} f1 f4 f2 f2U3 P4 P3 P1 P2 f3 The problem addressed To improve ranking functions, we present a new algorithm for clustering preference criteria if f2U3 is better than f2 and f3 {vi2 > ui2} f2U3 {vi3 > ui3}

  6. Applications • Information retrieval Optimizing Search Engines Using Clickthrough Data[Joachims, 2002] • Personalized recommenders Adaptive Route Advisor [Fiechter, Rogers, 2000] • Analysis of sensory data Used to test the quality (or the acceptability) of market products Panels of experts and consumers

  7. {object1 rating1} {object2 rating2} {object3 rating3} {object4 rating4} P1 P2 P3 P4 Baseline approaches If ratingi ratingjthenmerge Pi with Pj Where  uses correlation or cosine

  8. Weaknesses of baseline approaches Correlation or cosine were devised for prediction purposes in collaborative filtering, and they are not easily extendable to clustering: • Not all people have seen the same objects • Two samples of preferences of the same person would not be considered homogeneous • Rating is not a good idea

  9. Our approach: a clustering algorithm Ranking functions are linear maps: f(x) =w·x Then weight vectors w codify the rationale for these preferences Therefore, we will try to merge data sets with similar (cosine) ranking functions (= weight vectors) The merge will be accepted if the join ranking function improves the quality of individual functions

  10. Our approach: a clustering algorithm • A set of clusters ClusterPreferencesCriteria (a list of preference judgments (PJi: i = 1,…, N)) { • Clusters = ; for each i = 1 to N {wi = Learn a ranking hyperplane from (PJi); Clusters = Clusters U {(PJi, wi)}; } repeat { let (PJ1, w1) and (PJ2, w2) be the clusters with most similar w1 and w2;w = Learn a ranking hyperplane from (PJ1 U PJ2); if (quality of w >= (quality of w1 + quality of w2)) then replace the clusters (PJ1, w1) and (PJ2, w2) by (PJ1 U PJ2, w) in Clusters; } until (no new merges can be tested); return Clusters; • }

  11. To estimate quality of ranking functions The quality of the ranking functions depends on: • Accuracy, generalization errors • Number of Training examples • Number of Attributes

  12. To estimate quality of ranking functions If we have enough training data: • divide them in train (itself) and verification sets • compute the confidence interval of the probability of error when we apply each ranking function to the corresponding verification set: [L, R] • quality is 1-Rthe estimated proportion of successful generalization errorsin the pessimistic case

  13. To estimate quality of ranking functions If we don’t have too many training data: • Xi-alpha estimator [Joachims, 2000] (texts) • Cross-validation • Other

  14. Experimental results We used a collection of preference judgments taken from EachMovie to simulate reasonable situations in the study of preferences of groups of people • People: the 100 spectators with more ratings • Objects: the ratings of 504 movies (60% train, 20% verification, 20% test) given by other 89 spectators 808 spectators Training sets: preference judgments

  15. Experimental results 808 89

  16. A clustering algorithm to find groups with homogeneous preferences J. Díez, J.J. del Coz, O. Luaces, A. Bahamonde Centro de Inteligencia Artificial. Universidad de Oviedo at Gijónwww.aic.uniovi.es Workshop on Implicit Measures of User Interests and Preferences

More Related