The Informational Complexity of Interactive Machine Learning

The Informational Complexity of Interactive Machine Learning Steve Hanneke

Passive Learning Data Source Expert / Oracle Learning Algorithm Raw Unlabeled Data Labeled examples Algorithm outputs a classifier Steve Hanneke

Learning by Interaction: The Big Picture Data Source Learning Algorithm Expert / Oracle Raw Unlabeled Data Learner asks a question about the data Expert answers the question Learner asks a question about the data Expert answers the question . . . Algorithm outputs a classifier Steve Hanneke

Interactive Learning: A Manifesto • Machine learning is a collaborative effort between human and machine. • In passive learning, there is often a bottleneck on the human side (data annotation). • Conclusion: Passive algorithms are lazy collaborators. • Interactive algorithms may only require the human to expend effort providing relevant details, minimizing unnecessary redundancy. Steve Hanneke

The Value of Interaction • But how much improvement can we expect for any particular learning problem? • How much interaction is necessary and sufficient for learning? Steve Hanneke

Outline • Active learning with label requests • Disagreement Coefficient (Hanneke, ICML 2007) • Teaching Dimension (Hanneke, COLT 2007) • Class-conditional queries • Arbitrary Sample-based queries Steve Hanneke

Active Learning with Label Requests Steve Hanneke

Active Learning with Label Requests • This is clearly an upper bound on the label complexity of active learning. • Other than noise rate, VC dimension summarizes sample complexity. • The algorithm achieving this is ERM, and often must be approximated. Steve Hanneke

Reducing Uncertainty “Real knowledge is to know the extent of one’s ignorance.” -- Confucius “As we know, There are known knowns. There are things we know we know. We also know There are known unknowns. That is to say We know there are some things We do not know. But there are also unknown unknowns, The ones we don't know We don't know.” —Donald Rumsfeld, Feb. 12, 2002, Department of Defense news briefing Steve Hanneke

Reducing Uncertainty DIS(B(h,r)) h Concepts in B(h,r) look like this Steve Hanneke

Labeled Data Version Space Reducing Uncertainty: A2 Algorithm Version Space-based Passive Learning Add the labeled example to the data set. Repeat (x,y) x D x Sample an example from the distribution. h h h y Discard concepts we are statistically confident are suboptimal. Request its label from the Expert. Expert Steve Hanneke

Reducing Uncertainty: A2 Algorithm • A2 (Balcan, Beygelzimer & Langford, 2006) Steve Hanneke

Labeled Data Version Space Reducing Uncertainty: A2 Algorithm • A2 [BBL06] – (slightly oversimplified explanation) Add the labeled example to the data set. Version Space-based Agnostic Active Learning If it is not in the region of disagreement, ignore it (move on to next sample). Repeat (x,y) x x D x Sample an example from the distribution. h h h Discard concepts we are statistically confident are suboptimal (wrt the filtered distribution). y If it is in the region of disagreement, request its label from the Expert. Expert Steve Hanneke

Reducing Uncertainty Steve Hanneke

Exact Learning: Halving Algorithm • Suppose we can hand the teacher a concept, and ask for an example that contradicts it if one exists. (Equivalence queries) • The Halving algorithm (Littlestone, 88): • Let hmaj be the majority vote concept of C • Ask for an example (X,Y) where hmaj is wrong • If no such example exists, return hmaj • Else remove from C any h with h(X)  Y • The Halving algorithm needs at most log|C| queries to identify any target function in C. Steve Hanneke

Exact Learning: Membership Queries • Suppose, instead of equivalence queries, we can request the label of any example in X. • We still want to run the Halving algorithm. • How many label requests does it take to build an equivalence query? Steve Hanneke

Teaching Dimension (Hegedüs, 95) Steve Hanneke

Teaching Dimension for PAC Say V is linear separators. Sample U from D. A specifying set uniquely identifies (at most) one labeling in V[U]. As an example, take f to be this colored region. Steve Hanneke

XTD and Label Complexity Steve Hanneke

XTD and Label Complexity Conjecture: a bound of this form is valid, even with no knowledge of the noise rate (i.e., for agnostic learning). Steve Hanneke

What about other types of queries? • Ask the question you want answered For example, consider multiclass image classification. Perhaps learning would be easier if only the algorithm had an image of a car. What’s this a picture of? Horse Planet Person Car Steve Hanneke

Class-Conditional Queries • Ask the question you want answered For example, consider multiclass image classification. Perhaps learning would be easier if only the algorithm had an image of a car. Click on a picture of a car, if there is one. Can do this for each class individually (except perhaps the “other” class) Steve Hanneke

Class-Conditional Queries • A concrete example: Conjunctions (without noise). Steve Hanneke

Arbitrary Example-based Queries • Suppose we let the algorithm ask any question it wants about the data labels. Steve Hanneke

Cost Complexity Steve Hanneke

Questions?(cost = free )

Open Problems for Label Queries • The value of having more unlabeled data? (especially for Agnostic learning). • “Optimal” agnostic active learning algorithm? Steve Hanneke

Open Problems • Unknown cost functions E.g., maybe examples near the separator are more expensive to label. • Other types of queries: E.g., “give me a rule/explanation you used to decide the label of this example.” Steve Hanneke

Definition of GIC • Say the teacher gets drunk, and doesn’t necessarily answer accurately. But she manages to scribble her answers to every question on a piece of paper. • We have a spy who steals the paper and photocopies it. • The spy tells us exactly which questions to ask so that using minimum cost there is at most one concept in C consistent with the answers. • Define GIC(C,c) as the worst-case cost of this game. Steve Hanneke

The Informational Complexity of Interactive Machine Learning

The Informational Complexity of Interactive Machine Learning

Presentation Transcript

Interactive Innovations: Literacy the Essence of Learning

Low Complexity H.264 Encoder using Machine Learning.

Interactive Learning

Machine Learning

the feasibility of machine learning

Overview of Machine Learning

Interactive learning

Interactive Learning Station: THE OCEAN

The True Sample Complexity of Active Learning

Informational Complexity Notion of Reduction for Concept Classes

Two Formal Models of Interactive Machine

Machine Learning of Discourse

Learning About The Complexity Of Family Law

Basics Of Machine Learning

Machine learning Courses | Machine Learning Training

Applications of machine learning

Machine Learning Projects | Machine Learning Applications | Machine Learning Training | Simplilearn