Agnostically learning halfspaces

Agnostically learning halfspaces FOCS 2005

  w.h.p. h: X!{0,1} poly(1/) samples P [h(x)y] · opt +  P[f*(x)y] arbitrary dist. over (x,y) 2X £ {0,1} f* = argminf2F P [f(x)y] L. Sellie Agnostic learning Set X, F class of functionsf: X!{0,1}. Efficient Agnostic Learner

 w.h.p. h: Xn!{0,1} P [h(x)y] · opt +  P[f*(x)y] arbitrary dist. over (x,y) 2X £ {0,1} f* = argminf2F P [f(x)y] L. Sellie Agnostic learning Set XnµRn, Fn class of functionsf: Xn!{0,1}. n Efficient Agnostic Learner  poly(n,1/) samples

 w.h.p. h: Xn!{0,1} P[f*(x)y] arbitrary dist. over (x,y) 2X £ {0,1} f* = argminf2F P [f(x)y] L. Sellie Agnostic learning Set XnµRn, Fn class of functionsf: Xn!{0,1}. n Efficient Agnostic Learner  poly(n,1/) samples P [h(x)y] · opt +  in PAC model, P [f*(x)y] = 0

P[f*(x)y] h f* argminf2F P[f(x)y] Agnostic learning of halfspaces Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }. h: Rn!{0,1} P [h(x)y] · opt + 

P[f*(x)y] h f* Agnostic learning of halfspaces Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }. h: Rn!{0,1} P [h(x)y] · opt +  Special case: junctions, e.g.,f(x) = x1 Ç x3 = I(x1 + x3 ¸ 1) • Efficient agnostic-learn junctions ) PAC-learn DNF • NP-hard to properly agnostic learn

P[f*(x)y] f* Agnostic learning of halfspaces PAC learning halfspaces solved by LP Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }. h: Rn!{0,1} P [h(x)y] · opt + 

P[f*(x)y] h f* Agnostic learning of halfspaces PAC learning halfspaces with indep./random noise solved by: Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }. h: Rn!{0,1} P [h(x)y] · opt + 

h f* Agnostic learning of halfspaces Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }. h: Rn!{0,1} P [h(x)y] · opt +  minf2FnP[f(x)y] Equivalently, f*=“truth” with adversarial noise

nO(-4) Theorem 1: (w.h.p.) Our alg. outputs h: Rn!{0,1} with P[h(x)  y] · opt + , in time poly(n) (8 const >0), as long as draws x 2 Rn from: • Log-concave distribution, e.g.: uniform over convex set, exponential e-|x|, normal • Uniform over {-1,1}nor Sn-1={x2Rn| |x|=1} • …

2. Low-degree Fourier algorithm of • Chose , where • Outputh(x) = I(p(x)¸½) time nO(d) 1. L1polynomial regression algorithm ¼ minimizedeg(p)·d E [|p(x)-y|] • Given: d>0,(x1,y1),…,(xm,ym) 2Rn£ {0,1} • Find deg-d p(x) to minimize: • Pick 2 [0,1] at random, output h(x) = I(p(x)¸) multivariate time nO(d) ¼ minimizedeg(p)·d E [(p(x)-y)2] (requires x uniform from {-1,1}n) y x

·p lemma of : alg’s error· ½ - (½ - opt)2 + & Sellie 1. L1polynomial regression algorithm ¼ minimizedeg(p)·d E [|p(x)-y|] • Given: d>0,(x1,y1),…,(xm,ym) 2Rn£ {0,1} • Find deg-d p(x) to minimize: • Pick 2 [0,1] at random, output h(x) = I(p(x)¸) multivariate lemma: alg’s error · opt + mindeg(q)·dE [|f*(x)-q(x)|] 2. Low-degree Fourier algorithm of • Chose , where • Outputh(x) = I(p(x)¸½) ¼ minimizedeg(p)·d E [(p(x)-y)2] (requires x uniform from {-1,1}n) time nO(d) lemma: alg’s error·8(opt + mindeg(q)·dE [(f*(x)-q(x))2]) = e y x

Useful properties of logconcave dist’s: projection is logconcave, …, Approx degree is dimension-free for halfspaces q(x) ¼I(x ¸ 0) degree d=10 q(w¢x) ¼I(w¢x¸0) degree d=10

Hey, I’ve used Hermite (pronounced air-meet) polynomials many times. Approximating I(x ¸) (1 dimension) • Bound mindeg(q)·dE[(q(x) – I(x ¸))2] • Continuous distributions: orthogonal polynomials • Normal: Hermite polynomials • Logconcave (e-|x|/2 suffices): new polynomials • Uniform on sphere: Gegenbauer polynomials • Uniform on hypercube: Fourier <f,g> = E[f(x)g(x)]

Theorem 2: junctions (e.g., x1Æ x11Æ x17) • For arbitrary over {0,1}n£{0,1} the polynomial regression algorithm with d=O(n1/2log(1/)) (time -O*(n½)) outputs h with P[h(x)y] · opt +  Follows from previous lemmas +

Assume (x,y) = (1-) (x,f*(x)) +  (arbitrary (x,y)): • We get: error · O(n1/4 log(n/))  + using Rankin’s second bound uniform 2 Sn-1 How far can we get in poly(n,1/) time? Assume draws x uniform from: Sn-1 = { x2Rn| |x|=1} • Perceptron algorithm: error · O(pn) opt +  • We show: simple averaging algorithm of achieves error · O(log(1/opt)) opt + 

Half-space conclusions & future work • L1 poly reg: natural extension of Fourier learning • Works for non-uniform/arbitrary distributions • Tolerates agnostic noise • Works on both continuous and discrete problems • Future work • Work on all distributions (not just logconcave/uniform {-1,1}n) • opt +  using poly(n,1/) algorithm (we have poly(n) for fixed , and trivial: poly() for fixed n) • Other interesting classes of functions

Agnostically learning halfspaces

Agnostically learning halfspaces

Presentation Transcript

Learning About Learning

Reinforcement Learning : Learning Algorithms

Learning E-Learning: Introduction

Pseudorandom Generators for Halfspaces

Learning and Learning LiveCode

Learning styles Learning Preferences Learning Strategies

HUMAN LEARNING AND LEARNING

Learning about learning

Machine learning: Unsupervised learning

Learning about learning

Learning intersections and thresholds of halfspaces

Hardness of Learning Halfspaces with Noise

Active Learning = Deep Learning

Learning, testing, and approximating halfspaces

Learning and Learning Disabilities

Agnostic Learning of Conjunctions by Halfspaces is Hard

Learning inside, learning outside

Learning Teaching Teaching Learning

Learning intersections and thresholds of halfspaces