330 likes | 516 Views
The Informational Complexity of Interactive Machine Learning. Steve Hanneke. Passive Learning. Data Source. Expert / Oracle. Learning Algorithm. Raw Unlabeled Data. Labeled examples. Algorithm outputs a classifier. Learning by Interaction: The Big Picture. Data Source.
E N D
The Informational Complexity of Interactive Machine Learning Steve Hanneke
Passive Learning Data Source Expert / Oracle Learning Algorithm Raw Unlabeled Data Labeled examples Algorithm outputs a classifier Steve Hanneke
Learning by Interaction: The Big Picture Data Source Learning Algorithm Expert / Oracle Raw Unlabeled Data Learner asks a question about the data Expert answers the question Learner asks a question about the data Expert answers the question . . . Algorithm outputs a classifier Steve Hanneke
Interactive Learning: A Manifesto • Machine learning is a collaborative effort between human and machine. • In passive learning, there is often a bottleneck on the human side (data annotation). • Conclusion: Passive algorithms are lazy collaborators. • Interactive algorithms may only require the human to expend effort providing relevant details, minimizing unnecessary redundancy. Steve Hanneke
The Value of Interaction • But how much improvement can we expect for any particular learning problem? • How much interaction is necessary and sufficient for learning? Steve Hanneke
Outline • Active learning with label requests • Disagreement Coefficient (Hanneke, ICML 2007) • Teaching Dimension (Hanneke, COLT 2007) • Class-conditional queries • Arbitrary Sample-based queries Steve Hanneke
Active Learning with Label Requests Steve Hanneke
Active Learning with Label Requests • This is clearly an upper bound on the label complexity of active learning. • Other than noise rate, VC dimension summarizes sample complexity. • The algorithm achieving this is ERM, and often must be approximated. Steve Hanneke
Outline • Active learning with label requests • Disagreement Coefficient (Hanneke, ICML 2007) • Teaching Dimension (Hanneke, COLT 2007) • Class-conditional queries • Arbitrary Sample-based queries Steve Hanneke
Reducing Uncertainty “Real knowledge is to know the extent of one’s ignorance.” -- Confucius “As we know, There are known knowns. There are things we know we know. We also know There are known unknowns. That is to say We know there are some things We do not know. But there are also unknown unknowns, The ones we don't know We don't know.” —Donald Rumsfeld, Feb. 12, 2002, Department of Defense news briefing Steve Hanneke
Reducing Uncertainty DIS(B(h,r)) h Concepts in B(h,r) look like this Steve Hanneke
Labeled Data Version Space Reducing Uncertainty: A2 Algorithm Version Space-based Passive Learning Add the labeled example to the data set. Repeat (x,y) x D x Sample an example from the distribution. h h h y Discard concepts we are statistically confident are suboptimal. Request its label from the Expert. Expert Steve Hanneke
Reducing Uncertainty: A2 Algorithm • A2 (Balcan, Beygelzimer & Langford, 2006) Steve Hanneke
Labeled Data Version Space Reducing Uncertainty: A2 Algorithm • A2 [BBL06] – (slightly oversimplified explanation) Add the labeled example to the data set. Version Space-based Agnostic Active Learning If it is not in the region of disagreement, ignore it (move on to next sample). Repeat (x,y) x x D x Sample an example from the distribution. h h h Discard concepts we are statistically confident are suboptimal (wrt the filtered distribution). y If it is in the region of disagreement, request its label from the Expert. Expert Steve Hanneke
Reducing Uncertainty Steve Hanneke
Outline • Active learning with label requests • Disagreement Coefficient (Hanneke, ICML 2007) • Teaching Dimension (Hanneke, COLT 2007) • Class-conditional queries • Arbitrary Sample-based queries Steve Hanneke
Exact Learning: Halving Algorithm • Suppose we can hand the teacher a concept, and ask for an example that contradicts it if one exists. (Equivalence queries) • The Halving algorithm (Littlestone, 88): • Let hmaj be the majority vote concept of C • Ask for an example (X,Y) where hmaj is wrong • If no such example exists, return hmaj • Else remove from C any h with h(X) Y • The Halving algorithm needs at most log|C| queries to identify any target function in C. Steve Hanneke
Exact Learning: Membership Queries • Suppose, instead of equivalence queries, we can request the label of any example in X. • We still want to run the Halving algorithm. • How many label requests does it take to build an equivalence query? Steve Hanneke
Teaching Dimension (Hegedüs, 95) Steve Hanneke
Teaching Dimension for PAC Say V is linear separators. Sample U from D. A specifying set uniquely identifies (at most) one labeling in V[U]. As an example, take f to be this colored region. Steve Hanneke
XTD and Label Complexity Steve Hanneke
XTD and Label Complexity Conjecture: a bound of this form is valid, even with no knowledge of the noise rate (i.e., for agnostic learning). Steve Hanneke
Outline • Active learning with label requests • Disagreement Coefficient (Hanneke, ICML 2007) • Teaching Dimension (Hanneke, COLT 2007) • Class-conditional queries • Arbitrary Sample-based queries Steve Hanneke
What about other types of queries? • Ask the question you want answered For example, consider multiclass image classification. Perhaps learning would be easier if only the algorithm had an image of a car. What’s this a picture of? Horse Planet Person Car Steve Hanneke
Class-Conditional Queries • Ask the question you want answered For example, consider multiclass image classification. Perhaps learning would be easier if only the algorithm had an image of a car. Click on a picture of a car, if there is one. Can do this for each class individually (except perhaps the “other” class) Steve Hanneke
Class-Conditional Queries • A concrete example: Conjunctions (without noise). Steve Hanneke
Outline • Active learning with label requests • Disagreement Coefficient (Hanneke, ICML 2007) • Teaching Dimension (Hanneke, COLT 2007) • Class-conditional queries • Arbitrary Sample-based queries Steve Hanneke
Arbitrary Example-based Queries • Suppose we let the algorithm ask any question it wants about the data labels. Steve Hanneke
Cost Complexity Steve Hanneke
Open Problems for Label Queries • The value of having more unlabeled data? (especially for Agnostic learning). • “Optimal” agnostic active learning algorithm? Steve Hanneke
Open Problems • Unknown cost functions E.g., maybe examples near the separator are more expensive to label. • Other types of queries: E.g., “give me a rule/explanation you used to decide the label of this example.” Steve Hanneke
Definition of GIC • Say the teacher gets drunk, and doesn’t necessarily answer accurately. But she manages to scribble her answers to every question on a piece of paper. • We have a spy who steals the paper and photocopies it. • The spy tells us exactly which questions to ask so that using minimum cost there is at most one concept in C consistent with the answers. • Define GIC(C,c) as the worst-case cost of this game. Steve Hanneke