160 likes | 267 Views
A clustering algorithm to find groups with homogeneous preferences. J. Díez, J.J. del Coz, O. Luaces, A. Bahamonde. Centro de Inteligencia Artificial. Universidad de Oviedo at Gijón www.aic.uniovi.es Workshop on Implicit Measures of User Interests and Preferences.
E N D
A clustering algorithm to find groups with homogeneous preferences J. Díez, J.J. del Coz, O. Luaces, A. Bahamonde Centro de Inteligencia Artificial. Universidad de Oviedo at Gijónwww.aic.uniovi.es Workshop on Implicit Measures of User Interests and Preferences
The framework to learn preferences People tend to rate their preferences in a relative way Which middle circle do you think is larger?
Me The framework to learn people’s preferences • Regression is not a good idea • We will use training sets of preference judgments • pairs of vectors (v, u) where someone expresses that he or she prefers v to u SVMlinear {vi > ui: i IMe } fMe fMe is a linear ranking function: f(vi) > f(ui) whenever vi is preferable to ui
Me The framework to learn people’s preferences SVMlinear {vi > ui: i IMe } fMe • How useful is this ranking functionfMe? • Accuracy, generalization error • # Training examples • # Attributes • reliable, general, …
{vi1 > ui1} {vi4 > ui4} f1 f4 f2 f2U3 P4 P3 P1 P2 f3 The problem addressed To improve ranking functions, we present a new algorithm for clustering preference criteria if f2U3 is better than f2 and f3 {vi2 > ui2} f2U3 {vi3 > ui3}
Applications • Information retrieval Optimizing Search Engines Using Clickthrough Data[Joachims, 2002] • Personalized recommenders Adaptive Route Advisor [Fiechter, Rogers, 2000] • Analysis of sensory data Used to test the quality (or the acceptability) of market products Panels of experts and consumers
{object1 rating1} {object2 rating2} {object3 rating3} {object4 rating4} P1 P2 P3 P4 Baseline approaches If ratingi ratingjthenmerge Pi with Pj Where uses correlation or cosine
Weaknesses of baseline approaches Correlation or cosine were devised for prediction purposes in collaborative filtering, and they are not easily extendable to clustering: • Not all people have seen the same objects • Two samples of preferences of the same person would not be considered homogeneous • Rating is not a good idea
Our approach: a clustering algorithm Ranking functions are linear maps: f(x) =w·x Then weight vectors w codify the rationale for these preferences Therefore, we will try to merge data sets with similar (cosine) ranking functions (= weight vectors) The merge will be accepted if the join ranking function improves the quality of individual functions
Our approach: a clustering algorithm • A set of clusters ClusterPreferencesCriteria (a list of preference judgments (PJi: i = 1,…, N)) { • Clusters = ; for each i = 1 to N {wi = Learn a ranking hyperplane from (PJi); Clusters = Clusters U {(PJi, wi)}; } repeat { let (PJ1, w1) and (PJ2, w2) be the clusters with most similar w1 and w2;w = Learn a ranking hyperplane from (PJ1 U PJ2); if (quality of w >= (quality of w1 + quality of w2)) then replace the clusters (PJ1, w1) and (PJ2, w2) by (PJ1 U PJ2, w) in Clusters; } until (no new merges can be tested); return Clusters; • }
To estimate quality of ranking functions The quality of the ranking functions depends on: • Accuracy, generalization errors • Number of Training examples • Number of Attributes
To estimate quality of ranking functions If we have enough training data: • divide them in train (itself) and verification sets • compute the confidence interval of the probability of error when we apply each ranking function to the corresponding verification set: [L, R] • quality is 1-Rthe estimated proportion of successful generalization errorsin the pessimistic case
To estimate quality of ranking functions If we don’t have too many training data: • Xi-alpha estimator [Joachims, 2000] (texts) • Cross-validation • Other
Experimental results We used a collection of preference judgments taken from EachMovie to simulate reasonable situations in the study of preferences of groups of people • People: the 100 spectators with more ratings • Objects: the ratings of 504 movies (60% train, 20% verification, 20% test) given by other 89 spectators 808 spectators Training sets: preference judgments
Experimental results 808 89
A clustering algorithm to find groups with homogeneous preferences J. Díez, J.J. del Coz, O. Luaces, A. Bahamonde Centro de Inteligencia Artificial. Universidad de Oviedo at Gijónwww.aic.uniovi.es Workshop on Implicit Measures of User Interests and Preferences