Active Learning

Active Learning Lecture 26th Maria Florina Balcan Maria-Florina Balcan

Active Learning Data Source Expert / Oracle Unlabeled examples Learning Algorithm Request for the Label of an Example A Label for that Example Request for the Label of an Example A Label for that Example . . . Algorithm outputs a classifier • The learner can choose specific examples to be labeled. • He works harder, to use fewer labeled examples.

What Makes a Good Algorithm? • Guaranteed to output a relatively good classifier for most learning problems. • Doesn’t make too many label requests. • Choose the label requests carefully, to get informative labels. Maria-Florina Balcan

Can It Really Do Better Than Passive? • YES! (sometimes) • We often need far fewer labels for active learning than for passive. • This is predicted by theory and has been observed in practice. Maria-Florina Balcan

- + w Can adaptive querying help? [CAL92, Dasgupta04] hw(x) = 1(x ¸ w),C = {hw: w 2 R} • Threshold fns on the real line: Active Algorithm • Sample with 1/unlabeledexamples; do binary search. + - - • Binary search – need just O(log 1/) labels. Passive supervised: (1/) labels to find an -accurate threshold. Active: only O(log 1/) labels. Exponential improvement. Other interesting results as well.

Active Learning might not help [Dasgupta04] In general,number of queries needed depends on C and also on D. h3 C = {linear separators in R1}: active learning reduces sample complexitysubstantially. h2 C = {linear separators in R2}: there are some target hyp. for which no improvement can be achieved! - no matter how benign the input distr. h1 h0 In this case: learning to accuracy  requires 1/ labels… Maria-Florina Balcan

Examples where Active Learning helps In general,number of queries needed depends on C and also on D. • C = {linear separators in R1}: active learning reduces sample complexitysubstantially no matter what is the input distribution. • C - homogeneous linear separators in Rd, D - uniform distribution over unit sphere: • need only d log 1/ labels to find a hypothesis with error rate < . • Dasgupta, Kalai, Monteleoni, COLT 2005 • Freund et al., ’97. • Balcan-Broder-Zhang, COLT 07 Maria-Florina Balcan

Region of uncertainty [CAL92] • Current version space: part of C consistent with labels so far. • “Region of uncertainty” = part of data space about which there is still some uncertainty (i.e. disagreement within version space) • Example: data lies on circle in R2 and hypotheses are homogeneouslinear separators. current version space + + region of uncertainty in data space Maria-Florina Balcan

current version space region of uncertainy Region of uncertainty [CAL92] Algorithm: Pick a few points at random from the current region of uncertainty and query their labels. Maria-Florina Balcan

current version space + + region of uncertainty in data space Region of uncertainty [CAL92] • Current version space: part of C consistent with labels so far. • “Region of uncertainty” = part of data space about which there is still some uncertainty (i.e. disagreement within version space) Maria-Florina Balcan

Region of uncertainty [CAL92] • Current version space: part of C consistent with labels so far. • “Region of uncertainty” = part of data space about which there is still some uncertainty (i.e. disagreement within version space) new version space + + New region of uncertainty in data space Maria-Florina Balcan

Region of uncertainty [CAL92], Guarantees Algorithm: Pick a few points at random from the current region of uncertainty and query their labels. [Balcan, Beygelzimer, Langford, ICML’06] Analyze a version of this alg. which is robust to noise. • C- linear separators on the line, low noise, exponential • improvement. • C - homogeneous linear separators in Rd, D -uniform distribution over unit sphere. • low noise, need only d2 log 1/ labels to find a hypothesis with error rate < . • realizable case, d3/2 log 1/ labels. • supervised -- d/ labels. Maria-Florina Balcan

wk+1 wk w* γk Margin Based Active-Learning Algorithm [Balcan-Broder-Zhang, COLT 07] Use O(d) examples to find w1 of error 1/8. • iteratek=2, … , log(1/) • rejection sample mk samples x from D • satisfying |wk-1T¢ x| ·k ; • label them; • find wk2 B(wk-1, 1/2k )consistent with all these examples. • end iterate Maria-Florina Balcan

u (u,v) v v Margin Based Active-Learning, Realizable Case Theorem PX is uniform over Sd. If and then after iterations ws has error ·. Fact 1 Fact 2  Maria-Florina Balcan

u (u,v) v v u v  Margin Based Active-Learning, Realizable Case Theorem PX is uniform over Sd. If and then after iterations ws has error ·. Fact 1 Fact 3 If and Maria-Florina Balcan

wk+1 wk w* γk BBZ’07, Proof Idea • iteratek=2, … , log(1/) • Rejection sample mk samples x from D • satisfying |wk-1T¢ x| ·k ; • ask for labels and find wk2 B(wk-1, 1/2k ) • consistent with all these examples. • end iterate Assume wkhas error·. We are done if 9k s.t. wk+1 has error ·/2 and only need O(d log( 1/)) labels in round k. Maria-Florina Balcan

wk+1 wk w* γk BBZ’07, Proof Idea • iteratek=2, … , log(1/) • Rejection sample mk samples x from D • satisfying |wk-1T¢ x| ·k ; • ask for labels and find wk2 B(wk-1, 1/2k ) • consistent with all these examples. • end iterate Assume wkhas error·. We are done if 9k s.t. wk+1 has error ·/2 and only need O(d log( 1/)) labels in round k. Key Point Under the uniform distr. assumption for we have · /4 Maria-Florina Balcan

wk+1 wk w* γk BBZ’07, Proof Idea Key Point Under the uniform distr. assumption for we have · /4 Key Point So, it’s enough to ensure that We can do so by only using O(d log( 1/)) labels in round k. Maria-Florina Balcan

Active Learning

Active Learning

Presentation Transcript

Active Learning

Active learning

Active Learning

Active Learning

Active Learning

Active learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active learning