1 / 39

Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

On Agnostic Boosting and Parity Learning. Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua. Defs. Agnostic Learning = learning with adversarial noise Boosting = turn weak learner into strong learner Parities = parities of subsets of the bits

Download Presentation

Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Agnostic Boosting and Parity Learning Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua

  2. Defs • Agnostic Learning = learning with adversarial noise • Boosting = turn weak learner into strong learner • Parities = parities of subsets of the bits • f:{0,1}n→{0,1}. f(x)=x1x3x7 • Agnostic Boosting • Turning a weak agnostic learner to a strong agnostic learner • 2O(n/logn)-time algorithm for agnostically learning parities over any distribution Outline

  3. Agnostic Booster Agnostic boosting Weak learner. For any noise rate < ½ produces a better-than-trivial hypothesis Strong Learner. Produces almost-optimal hypothesis Runs weak learner as black box

  4. Learning with Noise It’s, like, a really hard model!!! * up to well-studied open problems (i.e. we know where we’re stuck)

  5. Agnostic Learning: some known results

  6. Agnostic Learning: some known results Due to hardness, or lack of tools??? Agnostic boosting: strong tool, makes it easier to design algorithms.

  7. Why care about agnostic learning? • More relevant in practice • Impossibility results might be useful for building cryptosystems

  8. Noisy learning f:{0,1}n→{0,1} from class F. alg gets samples <x,f(x)> where x is drawn from distribution D. • No noise • Random noise • Adversarial (≈agnostic) noise f Learning algorithm. Should approximate f up to error  Learning algorithm. Should approximate f up to error  f %noise g Learning algorithm. Should approximate g up to error  +  f allowed to corrupt -fraction

  9. Agnostic learning (geometric view) F f opt opt +  g PROPER LEARNING Parameters: F, metric Input: oracle for g Goal: return some element of blue ball

  10. Agnostic boosting definition D weak learner w.h.p. h g errD(g,h)· ½ - 100 opt · ½ - 

  11. Agnostic boosting Agnostic Booster w.h.p. h’ Samples from g errD(g,h’) · opt +  D weak learner w.h.p. h g errD(g,h)· ½ - 100 opt · ½ -  Runs weak learner poly(1/100)times

  12. Agnostic boosting Agnostic Booster w.h.p. h’ Samples from g errD(g,h’) ·opt +  +  D (,)-weak learner w.h.p. h g errD(g,h)·½ -  opt ·½ -  Runs weak learner poly(1/, 1/)times

  13. Agnostic Booster Agnostic boosting Weak learner. For any noise rate < ½ produces a better-than-trivial hypothesis Strong Learner. Produces almost-optimal hypothesis

  14. “Approximation Booster” Analogy poly-time MAX-3-SAT algorithm that when opt=7/8+ε produces solution with value 7/8+ε100 algorithm for MAX-3-SAT produces solution with value opt +  running time poly(n,1/)

  15. Gap 0 ½ 1 No hardness gap close to ½ booster no gap anywhere (additive PTAS)

  16. Agnostic boosting • New Analysis for Mansour-McAllester booster. • uses branching programs; nodes are weak hypotheses • Previous Agnostic Boosting: • Ben-David+Long+Mansour, and Gavinsky, defined agnostic boosting differently. • Their result cannot be used for our application

  17. Booster x h1 h1(x)=0 h1(x)=1 1 0

  18. Booster: Split step x different distribution different distribution h1 h1 h1(x)=0 h1(x)=0 h1(x)=1 h1(x)=1 h2’ 0 h2 1 h2‘(x)=0 h2‘(x)=1 h2(x)=0 h2(x)=1 1 0 1 0 choose the “better” option

  19. Booster: Split step x h1 h1(x)=0 h1(x)=1 h2 1 h2(x)=0 h2(x)=1 h3 0 H3(x)=0 h3(x)=1 1 0

  20. Booster: Split step x h1 h1(x)=0 h1(x)=1 h4 h2 H4(x)=0 h4(x)=1 h2(x)=0 h2(x)=1 1 0 h3 0 H3(x)=0 h3(x)=1 … 1 0

  21. Booster: Merge step x h1 h1(x)=0 h1(x)=1 h4 h2 H4(x)=0 h4(x)=1 h2(x)=0 h2(x)=1 1 0 h3 0 H3(x)=0 Merge if “similar” h3(x)=1 1 0

  22. Booster: Merge step x h1 h1(x)=0 h1(x)=1 h4 h2 H4(x)=0 h2(x)=0 h2(x)=1 h4(x)=1 0 h3 0 h3(x)=1 H3(x)=0 1 0

  23. Booster: Another split step x h1 h1(x)=0 h1(x)=1 h4 h2 H4(x)=0 h2(x)=0 h2(x)=1 h4(x)=1 0 h3 0 h3(x)=1 H3(x)=0 h5 … 0 0 1

  24. Booster: final result x h1 h1 h1 h1 h1 h1 h1 h1 h1 h1 h1 0 1

  25. Agnostically learning parities

  26. Application: Parity with Noise * non-proper learner. hypothesis is circuit with 2O(n/logn) gates Feldman et al give black-box reduction to random-noise case. We give direct result • Theorem:ε, have weak learner that for noise ½-ε produces an hypothesis which is wrong on ½-(2ε)n0.001/2 fraction of space. Running time 2O(n/logn)

  27. Corollary: Learners for many classes (without noise) • Can learn without noise any class with “guaranteed correlated parity”, in time 2O(n/logn) • e.g. DNF, any others? • A weak parity learner that runs in 2O(n0.32) time would beat the best algorithm known for learning DNF • Good evidence that parity with noise is hard efficient cryptosystems [Hopper-Blum, Blum-Furst-etal, and many others] ?

  28. Main Idea: 1. Take Learner which resists random noise (BKW) 2. Add Randomness to its behavior, until you get a Weak Agnostic learner. Idea of weak agnostic parity learner “Between two evils, I pick the one I haven’t tried before”– Mae West “Between two evils, I pick uniformly at random” – CS folklore

  29. Summary Problem: It is difficult but perhaps possible to design agnostic learning algorithms. Proposed Solution: Agnostic Boosting. Contributions: • Right(er) definition for weak agnostic learner • Agnostic boosting • Learning Parity with noise in hardest noise model • Entertaining STOC ’08 participants

  30. Open Problems • Find other applications for Agnostic Boosting • Improve PwN algorithms. • Get proper learner for parity with noise • Reduce PwN with agnostic noise to PwN with random noise • Get evidence that PwN is hard • Prove that if parity with noise is easy then FACTORING is easy. 128$ reward!

  31. May the parity be with you! The end.

  32. Sketch of weak parity learner

  33. Weak parity learner • Sample labeled points from distribution, sample unlabeled x, let’s guess f(x) Bucket according to last 2n/logn bits + + + to next round

  34. Weak parity learner LAST ROUND: • √n vectors with sum=0. gives guess for f(x) + + + =0 =0 =0

  35. Weak parity learner LAST ROUND: • √n vectors with sum=0. gives guess for f(x) • by symmetry, prob. of mistake = %mistakes • Claim: %mistakes (Cauchy-Schwartz) + + + =0 =0 =0

  36. Intuition behind two main parts

  37. Intuition behind Boosting

  38. Intuition behind Boosting decrease weight increase weight

  39. Intuition behind Boosting 1 • Run, reweight, run, reweight, … . Take majority of hypotheses. • Algorithmic & Efficient Yao-von Neumann Minimax Principle decrease weight 1 increase weight 2 0

More Related