250 likes | 383 Views
On the Limits of Dictatorial Classification. Reshef Meir School of Computer Science and Engineering, Hebrew University. Joint work with Shaull Almagor , Assaf Michaely and Jeffrey S. Rosenschein. Strategy-Proof Classification . An Example Motivation Our Model and previous results
E N D
On the Limits of Dictatorial Classification Reshef Meir School of Computer Science and Engineering, Hebrew University Joint work withShaullAlmagor, AssafMichaelyand Jeffrey S. Rosenschein
Strategy-Proof Classification • An Example • Motivation • Our Model and previous results • Filling the gap: proving a lower bound • The weighted case
Introduction Motivation Model Results Strategic labeling: an example ERM 5 errors
Introduction Motivation Model Results There is a better classifier! (for me…)
Introduction Motivation Model Results If I just change the labels… 2+5 = 7 errors
Classification Introduction Motivation Model Results E(x,y)~D[ c(x)≠y ] The Supervised Classification problem: • Input: a set of labeled data points {(xi,yi)}i=1..m • output: a classifier c from some predefined concept class C( e.g., functions of the form f : X{-,+} ) • We usually want c to classify correctly not just the sample, but to generalize well, i.e., to minimize R(c) ≡ the expected number of errors w.r.t. the distribution D(the 0/1 loss function)
Classification (cont.) Introduction Motivation Model Results A common approach is to return the ERM (Empirical Risk Minimizer), i.e., the concept in C that is the best w.r.t. the given samples (has the lowest number of errors) Generalizes well under some assumptions on the concept class C (e.g., linear classifiers tend to generalize well) With multiple experts, we can’t trust our ERM!
Where do we find “experts” with incentives? Introduction Motivation Model Results Example 1: A firm learning purchase patterns • Information gathered from local retailers • The resulting policy affects them • “the best policy, is the policy that fits my pattern”
Introduction Motivation Model Results Example 2:Internet polls / polls of experts Users Reported Dataset Classification Algorithm Classifier
Introduction Motivation Model Results Motivation from other domains Aggregating partitions Judgment aggregation Facility location (on the binary cube)
Introduction Motivation Model Results A problem instance is defined by • Set of agentsI = {1,...,n} • A set of data points X = {x1,...,xm} X • For each xkX agent i has a label yik{,} • Each pair sik=xk,yik is a sample • All samples of a single agent compose the labeled dataset Si = {si1,...,si,m(i)} • The joint dataset S= S1 , S2 ,…, Sn is our input • m=|S| • We denote the dataset with the reported labels by S’
Introduction Motivation Model Results Input: Example + – – – – + - + + – – – - + + – - + – – + X Xm Y1 {-,+}m Y2 {-,+}m Y3 {-,+}m S = S1, S2,…, Sn = (X,Y1),…, (X,Yn)
Introduction Motivation Model Results Mechanisms • A Mechanism M receives a labeled dataset S and outputs c = M(S) C • Private risk of i: Ri(c,S) = |{k: c(xik) yik}| / mi • Global risk: R(c,S) = |{i,k: c(xik) yik}| / m • We allow non-deterministic mechanisms • Measure the expected risk % of errors on Si % of errors on S
Introduction Motivation Model Results ERM We compare the outcome of M to the ERM: c* = ERM(S) = argmin(R(c),S) r* = R(c*,S) c C Can our mechanism simply compute and return the ERM?
Requirements Introduction Motivation Model Results MOST IMPORTANT SLIDE Are there any mechanisms that guarantee both SP and good approximation? (Lying) (Truth) • Good approximation: SR(M(S),S) ≤ α∙r* • Strategy-Proofness (SP): i,S,Si‘Ri(M(S-i, Si‘),S)≥ Ri(M(S),S) • ERM(S) is 1-approximating but not SP • ERM(S1) is SP but gives bad approximation
Related work Introduction Motivation Model Results • A study of SP mechanisms in Regression learning • O. Dekel, F. Fischer and A. D. Procaccia, SODA (2008), JCSS (2009). [supervised learning] • No SP mechanisms for Clustering • J. Perote-Peña and J. Perote, Economics Bulletin (2003) [unsupervised learning]
Previous work A simple case Introduction Motivation Model Results • Tiny concept class: |C|= 2 • Either “all positive” or “all negative” Theorem: • There is a SP 2-approximation mechanism • There are no SP α-approximation mechanisms, for any α<2 Meir, Procacciaand Rosenschein, AAAI 2008
Previous work General concept classes Introduction Motivation Model Results Theorem: Selecting a dictator at random is SP and guarantees approximation • True for any concept class C • Generalizes well from sampled data when C has a bounded VC dimension Open question #1: are there better mechanisms? Open question #2: what if agents are weighted? Meir, Procacciaand Rosenschein, IJCAI 2009
A lower bound Introduction Motivation Model Results • Our main result: • Matching the upper bound from IJCAI-09 • Proof is by a careful reduction to a voting scenario • We will see the proof sketch Theorem: There is a concept class C (where |C|=3), for which any SP mechanism has an approximation ratio of at least
Proof sketch Introduction Motivation Model Results Gibbard [‘77] proved that every (randomized) SP voting rule for 3 candidates, must be a lottery over dictators*. We define X = {x,y,z}, and C as follows: We also restrict the agents, so that each agent can have mixed labels on just one point
Proof sketch (cont.) Introduction Motivation Model Results • Suppose that M is SP
Proof sketch (cont.) Introduction Motivation Model Results cz> cy > cx cx> cz> cy • Suppose that M is SP • M must be monotone on the mixed point • M must ignore the mixed point • M is a (randomized) voting rule
Proof sketch (cont.) Introduction Motivation Model Results cz> cy > cx cx> cz> cy • By Gibbard [‘77], M is a random dictator • We construct an instance where random dictators perform poorly
Weighted agents Introduction Motivation Model Results • We must select a dictator randomly • However, probability may be based on weight • Naïve approach: • Only gives 3-approximation • An optimal SP algorithm: • Matches the lower bound of
Future work Introduction Motivation Model Results Other concept classes Other loss functions (linear loss, quadratic loss,…) Alternative assumptions on structure of data Other models of strategic behavior …