1 / 19

Active Learning

Active Learning. Lecture 26th. Maria Florina Balcan. Active Learning. Data Source. Expert / Oracle. Unlabeled examples. Learning Algorithm. Request for the Label of an Example. A Label for that Example. Request for the Label of an Example. A Label for that Example.

Download Presentation

Active Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Active Learning Lecture 26th Maria Florina Balcan Maria-Florina Balcan

  2. Active Learning Data Source Expert / Oracle Unlabeled examples Learning Algorithm Request for the Label of an Example A Label for that Example Request for the Label of an Example A Label for that Example . . . Algorithm outputs a classifier • The learner can choose specific examples to be labeled. • He works harder, to use fewer labeled examples.

  3. What Makes a Good Algorithm? • Guaranteed to output a relatively good classifier for most learning problems. • Doesn’t make too many label requests. • Choose the label requests carefully, to get informative labels. Maria-Florina Balcan

  4. Can It Really Do Better Than Passive? • YES! (sometimes) • We often need far fewer labels for active learning than for passive. • This is predicted by theory and has been observed in practice. Maria-Florina Balcan

  5. - + w Can adaptive querying help? [CAL92, Dasgupta04] hw(x) = 1(x ¸ w),C = {hw: w 2 R} • Threshold fns on the real line: Active Algorithm • Sample with 1/unlabeledexamples; do binary search. + - - • Binary search – need just O(log 1/) labels. Passive supervised: (1/) labels to find an -accurate threshold. Active: only O(log 1/) labels. Exponential improvement. Other interesting results as well.

  6. Active Learning might not help [Dasgupta04] In general,number of queries needed depends on C and also on D. h3 C = {linear separators in R1}: active learning reduces sample complexitysubstantially. h2 C = {linear separators in R2}: there are some target hyp. for which no improvement can be achieved! - no matter how benign the input distr. h1 h0 In this case: learning to accuracy  requires 1/ labels… Maria-Florina Balcan

  7. Examples where Active Learning helps In general,number of queries needed depends on C and also on D. • C = {linear separators in R1}: active learning reduces sample complexitysubstantially no matter what is the input distribution. • C - homogeneous linear separators in Rd, D - uniform distribution over unit sphere: • need only d log 1/ labels to find a hypothesis with error rate < . • Dasgupta, Kalai, Monteleoni, COLT 2005 • Freund et al., ’97. • Balcan-Broder-Zhang, COLT 07 Maria-Florina Balcan

  8. Region of uncertainty [CAL92] • Current version space: part of C consistent with labels so far. • “Region of uncertainty” = part of data space about which there is still some uncertainty (i.e. disagreement within version space) • Example: data lies on circle in R2 and hypotheses are homogeneouslinear separators. current version space + + region of uncertainty in data space Maria-Florina Balcan

  9. current version space region of uncertainy Region of uncertainty [CAL92] Algorithm: Pick a few points at random from the current region of uncertainty and query their labels. Maria-Florina Balcan

  10. current version space + + region of uncertainty in data space Region of uncertainty [CAL92] • Current version space: part of C consistent with labels so far. • “Region of uncertainty” = part of data space about which there is still some uncertainty (i.e. disagreement within version space) Maria-Florina Balcan

  11. Region of uncertainty [CAL92] • Current version space: part of C consistent with labels so far. • “Region of uncertainty” = part of data space about which there is still some uncertainty (i.e. disagreement within version space) new version space + + New region of uncertainty in data space Maria-Florina Balcan

  12. Region of uncertainty [CAL92], Guarantees Algorithm: Pick a few points at random from the current region of uncertainty and query their labels. [Balcan, Beygelzimer, Langford, ICML’06] Analyze a version of this alg. which is robust to noise. • C- linear separators on the line, low noise, exponential • improvement. • C - homogeneous linear separators in Rd, D -uniform distribution over unit sphere. • low noise, need only d2 log 1/ labels to find a hypothesis with error rate < . • realizable case, d3/2 log 1/ labels. • supervised -- d/ labels. Maria-Florina Balcan

  13. wk+1 wk w* γk Margin Based Active-Learning Algorithm [Balcan-Broder-Zhang, COLT 07] Use O(d) examples to find w1 of error 1/8. • iteratek=2, … , log(1/) • rejection sample mk samples x from D • satisfying |wk-1T¢ x| ·k ; • label them; • find wk2 B(wk-1, 1/2k )consistent with all these examples. • end iterate Maria-Florina Balcan

  14. u (u,v) v v Margin Based Active-Learning, Realizable Case Theorem PX is uniform over Sd. If and then after iterations ws has error ·. Fact 1 Fact 2  Maria-Florina Balcan

  15. u (u,v) v v u v  Margin Based Active-Learning, Realizable Case Theorem PX is uniform over Sd. If and then after iterations ws has error ·. Fact 1 Fact 3 If and Maria-Florina Balcan

  16. wk+1 wk w* γk BBZ’07, Proof Idea • iteratek=2, … , log(1/) • Rejection sample mk samples x from D • satisfying |wk-1T¢ x| ·k ; • ask for labels and find wk2 B(wk-1, 1/2k ) • consistent with all these examples. • end iterate Assume wkhas error·. We are done if 9k s.t. wk+1 has error ·/2 and only need O(d log( 1/)) labels in round k. Maria-Florina Balcan

  17. wk+1 wk w* γk BBZ’07, Proof Idea • iteratek=2, … , log(1/) • Rejection sample mk samples x from D • satisfying |wk-1T¢ x| ·k ; • ask for labels and find wk2 B(wk-1, 1/2k ) • consistent with all these examples. • end iterate Assume wkhas error·. We are done if 9k s.t. wk+1 has error ·/2 and only need O(d log( 1/)) labels in round k. Maria-Florina Balcan

  18. wk+1 wk w* γk BBZ’07, Proof Idea • iteratek=2, … , log(1/) • Rejection sample mk samples x from D • satisfying |wk-1T¢ x| ·k ; • ask for labels and find wk2 B(wk-1, 1/2k ) • consistent with all these examples. • end iterate Assume wkhas error·. We are done if 9k s.t. wk+1 has error ·/2 and only need O(d log( 1/)) labels in round k. Key Point Under the uniform distr. assumption for we have · /4 Maria-Florina Balcan

  19. wk+1 wk w* γk BBZ’07, Proof Idea Key Point Under the uniform distr. assumption for we have · /4 Key Point So, it’s enough to ensure that We can do so by only using O(d log( 1/)) labels in round k. Maria-Florina Balcan

More Related