240 likes | 377 Views
Reductions to the Noisy Parity Problem. Harvard UW Georgia Tech Georgia Tech. Vitaly Feldman Parikshit Gopalan Subhash Khot Ashok K. Ponnuswami. aka New Results on Learning Parities, Halfspaces, Monomials, Mahjongg etc. . TexPoint fonts used in EMF.
E N D
Reductions to the Noisy Parity Problem Harvard UW Georgia Tech Georgia Tech Vitaly Feldman Parikshit Gopalan Subhash Khot Ashok K. Ponnuswami aka New Results on Learning Parities, Halfspaces, Monomials, Mahjongg etc. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAA
Uniform Distribution Learning x, f(x) x ← {0,1}n f: {0,1}n! {+1,-1} Goal: Learn the function f in poly(n) time.
Uniform Distribution Learning • Goal: Learn the function f in poly(n) time. • Information theoretically impossible. • Will assume f has nice structure, such as • Parity f(x) = (-1)·x • Halfspace f(x) = sgn(w·x) • k-junta f(x) = f(xi1,…,xik) • Decision Tree • DNF x, f(x)
Uniform Distribution Learning • Goal: Learn the function f in poly(n) time. • Parity nO(1)Gaussian elim. • Halfspace nO(1)LP • k-junta n0.7k[MOS] • Decision Tree nlog nFourier • DNF nlog nFourier x, f(x)
Uniform Distribution Learning with Random Noise x ← {0,1}n f: {0,1}n! {+1,-1} e = 1 w.p = 0w.p 1 - x, (-1)e·f(x) Goal: Learn the function f in poly(n) time.
Uniform Distribution Learning with Random Noise • Goal: Learn the function f in poly(n) time. • Parity Noisy Parity • Halfspace nO(1)[BFKV] • k-junta nkFourier • Decision Tree nlog nFourier • DNF nlog nFourier x, (-1)e·f(x)
The Noisy Parity Problem Coding Theory: Decoding a random linear code from random noise. Best Known Algorithm: 2n/log nBlum-Kalai-Wasserman [BKW] Believed to be hard. Variant: Noisy parity of size k. Brute force runs in time O(nk). x, (-1)e·f(x)
Agnostic Learning under the Uniform Distribution x, g(x) g(x) is a {-1,+1} random variable. Prx[g(x) f(x)] ≤ Goal: Get an approx. to g that is as good as f.
Agnostic Learning under the Uniform Distribution • Goal: Get an approx. to g that is as good as f. • If the function f is a • Parity 2n/log n[FGKP] • Halfspace nO(1)[KKMS] • k-junta nk[KKMS] • Decision Tree nlog n[KKMS] • DNF nlog n[KKMS] x, g(x)
Agnostic Learning of Parities • Given g which has a large Fourier coefficient, find it. • Coding Theory: Decoding a random linear code with adversarial noise. • If queries were allowed: • Hadamard list decoding [GL, KM]. • Basis of algorithms for Decision trees [KM], DNF [Jackson]. x, g(x)
Reductions between problems and models Noise-free Agnostic Random x, (-1)e·f(x) x, f(x) x, g(x)
Reductions to Noisy Parity • Theorem [FGKP]: Learning Juntas, Decision Trees and DNFs reduce to learning noisy parities of size k.
Uniform Distribution Learning • Goal: Learn the function f in poly(n) time. • Parity nO(1)Gaussian elim. • Halfspace nO(1)LP • k-junta n0.7k[MOS] • Decision Tree nlog nFourier • DNF nlog nFourier x, f(x)
Reductions to Noisy Parity • Theorem [FGKP]: Learning Juntas, Decision Trees and DNFs reduce to learning noisy parities of size k. Evidence in favor of noisy parity being hard? Reduction holds even with random classification noise.
Uniform Distribution Learning with Random Noise • Goal: Learn the function f in poly(n) time. • Parity Noisy Parity • Halfspace nO(1)[BFKV] • k-junta nkFourier • Decision Tree nlog nFourier • DNF nlog nFourier x, (-1)e·f(x)
Reductions to Noisy Parity • Theorem [FGKP]: Agnostically learning parity with error-rate reduces to learning noisy parity with error-rate . • With BKW, gives 2n/log n agnostic learning algorithm. • Main Idea: A noisy parity algorithm can help find large Fourier coefficients from random examples.
Reductions between problems and models • Probabilistic Oracle Noise-free Agnostic Random x, (-1)e·f(x) x, f(x) x, g(x)
Probabilistic Oracles Givenh: {0,1}n! [-1,1] h x, b x ← {0,1}n, b 2 {-1,+1}. E[b | x] = h(x).
Simulating Noisefree Oracles Let f: {0,1}n! {-1,1}. f x, f(x) x, b E[b | x] = f(x) 2 {-1,1}, hence b = f(x)
Simulating Random Noise Given f: {0,1}n! {-1,1} and = 0.1 Let h(x) = 0.8 f(x). 0.8f x, f(x) x, b E[b | x] = 0.8 f(x) Hence b = f(x) w.p 0.9 b = -f(x) w.p 0.1
Simulating Adversarial Noise Given g(x) is a {-1,1} r.v.andPrx[g(x) f(x)] = . Let h(x) = E[g(x)]. ≡ h x, g(x) x, b Bound on error rate implies Ex[|h(x) – f(x)|] <
Reductions between problems and models • Probabilistic Oracle Noise-free Agnostic Random x, (-1)e·f(x) x, f(x) x, g(x)
That's it … for the slideshow.