1 / 27

Active Perspectives on Computational Learning and Testing

Active Perspectives on Computational Learning and Testing. Liu Yang. Slide 1. Interaction. An Interactive Protocol - Have algorithm interact with oracle/experts Care about - Query complexity - Computational Efficiency

ave
Download Presentation

Active Perspectives on Computational Learning and Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Active Perspectives on Computational Learning and Testing Liu Yang Slide 1

  2. Interaction • An Interactive Protocol - Have algorithm interact with oracle/experts • Care about - Query complexity - Computational Efficiency • Question: how much better can learner do w/ interaction, vs. getting data in one shot ? Slide 2

  3. Notation • Instance space X = {0, 1}^n • Concept space C: collection of fn h: X -> {-1,1} • Distribution D over X • Unknown target function h*: the true labeling function (Realizable case: h* in C) • Err(h) = Px~D[h(x) ~= h*(x)] Slide 3

  4. “Active” means Label Request • Label request: have a pool of unlabeled exs, pick any x and receive h*(x), repeat Alg achieves Query ComplexityS(ε,δ,h) for (C,D) if it outputs hn after ≤ n label requests, and for any h* in C, ε > 0, δ > 0, n ≥ S(ε, δ, h), P[err(hn ≤ ε )] ≥ 1- δ • Motivation: labeled data is expensive to get • Using label request, can do - Active Learning: find h has small err(h) - Active Testing: decide h* in C or far from C Slide 4

  5. This Talk Thesis Outline • Bayesian Active Learning Using Arbitrary Binary Valued Queries (Published) • Active Property Testing (Major results submitted) • Self-Verifying Bayesian Active Learning (Published) • A Theory of Transfer Learning (Accepted) • Learning with General Types of Query (In progress) • Active Learning with a Drifting Distri. (Submitted) Slide 5

  6. Outline • Active Property Testing (Submitted) • Self-Verifying Bayesian Active Learning (Published) • Transfer Learning(Accepted) • Learning with General Types of Query (in progress) Slide 6

  7. Property Testing • What Property Testing is for ? - Quickly tell whether the right fn class - Estimate complexity of fn without actually learning • Question : Can you do w/ fewer queries than learning ? • Yes !!! e.g. Union of d Intervals, testing help!!!----++++----+++++++++-----++---+++-------- - Testing tells how big d need to be close to target - #Label: Active Testing need O(1), Passive Testing need Θ(√d), Active Learning need Θ(d) Slide 7

  8. Active Property Testing • Passive Testing : has no control on labeled exs • Membership Query: unrealistic to query fn. at arbitrary points • Active query asks labels from what exist in env • Question : Is active testing still get significant benefit in label requests over passive testing ? Slide 8

  9. Property Tester • - Accepts w.p. >= 2/3 if h* in C - Rejects w.p. >= 2/3 if d(h*;C) = ming in C Px~D [h*(x) ≠ g(x)] ≥ ε Slide 9

  10. Testing Union of d Intervals • ----++++----+++++++++-----++---+++-------- Theorem. Testing unions of <=d intervals in active testing model uses only queries. • Noise Sensitivity := Pr [two close points labeled diff] • Proof Idea: - all unions of d intervals have low noise sensitivity - all functions that are far from this class have noticeably larger noise sensitivity - we introduce a tester that estimates the noise sensitivity of the input function. Slide 10

  11. Summary of Results ★★Active Testing (constant # of queries ) much better than Passive Testing and Active Learning on unions of intervals, cluster assumption, margin assumption ★Testing is easier than learning on LTF ✪ For dictator (single variable fn), Active Testing no help Slide 11

  12. Outline • Active Property Testing (Submitted) • Self-Verifying Bayesian Active Learning (Published) • Transfer Learning(Accepted) • Learning with General Types of Query (in progress) Slide 12

  13. Self-Verifying Bayesian Active Learning Self-verifying (a special type of stopping criterion) • given ε, adaptively decides # of query, then halts • has the property that E[err] < ε when halts Question: Can you do with E[#query] = o(1/ε) ? (passive learning need 1/ε labels) Slide 13

  14. Example: Intervals Suppose D is uniform on [0,1] - - + 0 1 Slide 14

  15. 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 } } } } } } } } } } } } } } } } } } } } } } } } } } } } } } } } Example: Intervals Verification Lower Bound alg somehow arrives at h, er(h) < ε; how to verify hε close to h* ? h* ε close to h every h’ ε outsidegreen ball is not h* - Everything on red circle is outside green ball; so have to verify those are not h* - Suppose h* is empty interval, then er(h) < ε => p(h=+) < ε. C h h* ε 2ε • In particular, intervals of width 2 are on red circle . • Need one label in each interval to verify it is not h* • So need (1/) labels to verify the target isn’t one of these. 0 1 h* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Suppose h* is empty interval, D is uniform on [0,1] Slide 15 Slide 16

  16. Learning with a prior • Suppose we know a distribution the target is sampled from, call it prior Slide 16

  17. Interval Example with prior- - - - - |+++++++|- - - - - - • Algorithm: Query random pts till find first +, do binary search to find end-pts. Halt when reach a pre-specified prior-based query budget. Output posterior’s Bayes classifier. • Let budget N be high enough so E[err] < ε - N = o(1/ε) sufficient for E[err|w*>0] < ε: can learn each interval of w > 0 w/ o(1/eps) queries; by DCT, the average of any collection of o(1/N) fns is o(1/N), so only need N=o(1/ε) to make E[err|w*>0] < ε. - N = o(1/ε) sufficient for E[err|w*=0] < ε: if P(w*=0)>0, then after some L = O(log(1/ε)) queries, w.p.> 1-ε, most prob. mass on empty interval, so posterior’s Bayes classifier has 0 error rate Slide 17

  18. Can do o(1/eps) for any VC-class Theorem : With the prior, can do o(1/ε) • There are methods that find a good classifier in o(1/eps) queries (though they aren’t self-verifying) [see TrueSampleComplexityAL08] • Need to set a stopping criterion for those alg • The stop criterion we use : let alg run until make a certain #query (set the budget to be just large enough so the E[err] < ε) Slide 18

  19. Outline • Active Property Testing (Submitted) • Self-Verifying Bayesian Active Learning (Published) • Transfer Learning (Accepted) • Learning with General Types of Query (in progress) Slide 19

  20. Model of Transfer Learning Motivation: Learners often Not Too Altruistic Task T Task 1 Layer 1: draw task i.i.d. from (unknown) prior … hT* h2* prior Task 2 Better Estimate of Prior h1* x11,y11 x1k,y1k xT1,yT1 xTk,yTk … … x21,y21 x2k,y2k … Layer 2: per task, draw data i.i.d. from target Slide 20

  21. Insights • Using a good estimate of the prior is almost as good as using the true prior - Now we only need a VC-dim # of additional points from each task, to get a good estimate of the prior - We’ve seen self-verifying alg, if given the true prior, has guarantee on the err and # queries - As #task ->∞, when call self-verifying alg, it outputs a classifier as good as if it had the true prior Slide 21

  22. Main Result • Design an alg using this insight - Uses at most VC-dim # additional labeled points per task (vs. learning with known true prior) - Estimate joint distribution on (x1,y1)…(xd,yd) - Invert to get prior estimate • Running this alg asymptotically just as good as having direct knowledge of prior (Bayesian) - [HKS] showed passive save const. factors in Θ(1/ε) sample complexity per task (replace vc-dim w/ a prior-dependent complexity measure) - We showed access to prior can improve active sample complexity, sometimes from Θ(1/ε) to o(1/ε). Slide 22

  23. Estimating the prior Insight : Identifiability of priors by d-dim joint distri. • Distrib of full sequence (x1,y1),(x2,y2),… uniquely identifies prior • Set of joint distribs on (x1,y1),…,(xk,yk) s.t. 1≤k<∞ identifies distrib of full sequence (x1,y1),(x2,y2),… • For any k > d=VC-dim, can express distrib on (x1,y1),…,(xk,yk) in terms of distrib of (x1,y1),…,(xd,yd). • How to do it when d =1 ? e.g. threshold - for two points x1, x2, if x1 < x2, then Pr(+,-)=0, Pr(+,+)=Pr(+.), Pr(-,-)=Pr(.-), Pr(-,+)=Pr(.+)-Pr(++) = Pr(.+)-Pr(+.) - for any k > 1 points, can directly to reduce from k to 1 P(-----------(-+)++++++++++) = P( (-+) ) = P( (.+) ) - P( (+.) ) Slide 23

  24. Outline • Active Property Testing (Submitted) • Self-Verifying Bayesian Active learning (published) • Transfer Learning (Accepted) • Learning with General Types of Query (in progress) - Learning DNF formula - Learning Voronoi Slide 24

  25. Learning with General Query • Construct problem-specific queries used to efficiently learn those problems having no known efficient algorithms to PAC-learn. Slide 25

  26. - DNF formulas:n is # of variables; poly-sized DNF: # of terms is nO(1) e.g. set of fn like f = (x1∧x2) ∨ (x1∧x4) Natural form of knowledge representation [Valiant 1984]; a great challenge over 20 years PAC-learning DNF formulas appears to be very hard. Fastest known alg[KS01] runs in time exp(n1/3 log2n). If the alg forced to output a hypothesis which itself is a DNF, the problem is NP-hard. Learning DNF formulas (x1∧x2) (x1∧x4) Slide 26

  27. Thanks ! Slide 31

More Related