370 likes | 507 Views
An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution. Jeffrey C. Jackson. Presented By: Eitan Yaakobi Tamar Aizikowitz. Presentation Outline. Introduction Algorithms We Use Estimating Expected Values Hypothesis Boosting
E N D
An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution Jeffrey C. Jackson Presented By: Eitan Yaakobi Tamar Aizikowitz
Presentation Outline • Introduction • Algorithms We Use • Estimating Expected Values • Hypothesis Boosting • Finding Weak-approximating Parity Functions • Learning DNF With Respect to Uniform • Existence of Weak Approximating Parity Functions for every f, D • Nonuniform Weak DNF Learning • Strongly Learning DNF
Introduction • DNF is weakly-learnable with respect to the uniform distribution as shown by Kushilevitz and Mansour. • We show that DNF is weakly learnable with respect to a certain class of nonuniform distributions. • We then use a method based on Freund’s boosting algorithm to produce a strong learner with respect to uniform.
Algorithms We Use • Our learning algorithm makes use of several previous algorithms. • Following is a short reminder of these algorithms.
Estimating Expected Values • The AMEAN Algorithm: • Efficiently estimates the expectancy of a random variable. • Based on Hoeffding’s inequality: • Let Xibe independent random variables such that: Xi , Xi[a,b] and E[Xi]=μ then:
The AMEAN Algorithm • Input: • random X[a,b] • b– a • λ, > 0 • Output: • μ’ such that Pr[|E[X] –μ’|λ] 1 – δ • Running time: • O((b-a)2log(δ-1) / λ2)
Hypothesis Boosting • Our algorithm is based on boosting weak hypotheses into a final strong hypothesis. • We use a boosting method very similar to Freund’s boosting algorithm. • We refer to Freund’s original algorithm as F1.
The F1 Boosting Algorithm • Input: • positive ε, δandγ • (½ – γ)-approximate PAC learner for representation class • EX(f,D) for some f in and any distribution D • Output: • ε-approximation for f with respect to D with probability at least 1 – δ • Running time: • polynomial in n, s,γ-1,ε -1, and log(δ -1)
The Idea Behind F1(1) • The algorithm generates a series of weak hypotheses hi. • h0 is a weak approximator for f with respect to the distribution D. • Each subsequent hiis a weak approximator for f with respect to the distribution Di.
The Idea Behind F1(2) • Each distribution Di focuses weight on those areas where slightly more than half the hypotheses already generated were incorrect. • The final hypothesis h is a majority vote on all the hi-s.
The Idea Behind F1(3) • If a sufficient number of weak hypotheses is generated then h will be an ε-approximator for f with respect to the distribution D. • Freund showed that ½γ-2ln(ε-1) weak hypotheses suffice.
Finding Weak-approximating Parity Functions • In order to use the boosting algorithm, we need to be able to generate weak-approximators for our DNF f with respect to the distributions Di. • Our algorithm is based on the Weak Parity algorithm (WP) by Kushilevitz and Mansour.
The WP Algorithm • Finds the large Fourier coefficients of a Boolean function f on {0,1}n using a Membership Oracle for f.
The WP’ Algorithm (1) • Our learning algorithm will need to find the large coefficients of a non-Boolean function. • The basic WP algorithm can be extended to the WP’ algorithm which works for non-Boolean f as well. • WP’ gives us a weak approximator for a non-Boolean f with respect to the uniform distribution.
The WP’ Algorithm (2) • Input: • MEM( f )forf:{0,1}n→ • θ, δ, n, L( f ) > 0 • Output: • With probability at least 1 – δ, WP’ outputs a set S such that for all A: • Running time:
Learning DNF with Respect to Uniform • We now show the main result: DNF is learnable with respect to uniform. • We begin by showing that for every DNFf and distribution D there exists a parity function that weakly approximates f with respect to D. • We use this to produce an algorithm for weakly learning DNF with respect to certain nonuniform distributions. • Finally we show that this weak learner can be boosted into a strong learner with respect to the uniform distribution.
Existence of Weak Approximating Parity Functions for every f, D(1) • For every DNF f and every distribution D there exists a parity function that weakly approximates f with respect to D. • The more difficult case is when ED[f]~ 0.
Existence of Weak Approximating Parity Functions for every f, D(2) • Let f be a DNF such that E[f]~ 0. • Let s be the number of terms in f. • Let T(x) be the {-1,+1} valued function equivalent to the term in f best correlated with f with respect to D.
Existence of Weak Approximating Parity Functions for every f, D(3)
Existence of Weak Approximating Parity Functions for every f, D(4) • T is a term of f PrD[T(x)=f(x) |f(x) = -1] = 1 • There are s terms in f, T is the best correlated with f PrD[ T(x)=f(x) |f(x) = 1 ] ≥ 1/s • PrD[ T(x)=f(x) ] ≥ 1/2(1 + 1/s) • ED[fT] ≥ 1/s
Existence of Weak Approximating Parity Functions for every f, D(5) • Tcan be represented using the Fourier transform. • Define:
Nonuniform Weak DNF Learning (1) • We have shown that for every DNF fand every distribution D there exists a parity function that is a weak approximator for f with respect to D. • How can we find such a parity function? • We want an algorithm that when given a threshold θand a distribution D finds a parity such that, say:
Nonuniform Weak DNF Learning (3) • We have reduced the problem of finding a well correlated parity to finding a large Fourier coefficient of g. • g is not Boolean therefore we use WP’. • Invocation: WP’(n,MEM(g),θ,L(g) ,) MEM(g)(x) 2n MEM(f)(x) D
The WDNF Algorithm (1) • We define a new algorithm: Weak DNF (WDNF). • WDNF finds the large Fourier coefficients of g(x)=2nf(x)D(x) therefore finding a parity that is well correlated with fwith respect to the distribution D. • WDNF makes use of the WP’ algorithm for finding the Fourier coefficients of the non-Boolean g.
The WDNF Algorithm (2) • Proof of Existence: • Let g(x)=2nf(x)D(x) • Output with prob. 1 – : • Running Time: poly. in n, s, log(-1), and L(2nD)
The WDNF Algorithm (3) • Input: • EX(f,D) • MEM(f ) • D • δ > 0 • Output: • With probability at least 1 – δ: parity function h (possibly negated) s.t.: ED[fh] = Ω(s-1) • Running time: • polynomial in n, s, log(-1), and L(2nD)
The WDNF Algorithm (4) • WDNF is polynomial in L(g) = L(2nD). • If D is at most poly(n,s,ε, -1)/2nthen WDNF runs polynomially in the normal parameters. • Such D is referred to as polynomially-near uniform. • WDNF weakly learns DNF with respect to any polynomially-near uniform distribution D.
Strongly Learning DNF • We define the Harmonic Sieve Algorithm (HS). • HS is an application of the F1 boosting algorithm on the weak learner generated by WDNF. • The main difference between HS and F1 is the need to supply WDNF with an oracle for distribution Diat each stage of boosting.
The HS Algorithm (1) • Input: • EX(f,D) • MEM(f ) • D • s • ε,>0 • Output: • With probability 1 – : hs.t. h is an ε-approximator of f with respect to D. • Running Time: • polynomial in n, s, ε-1, log(-1), and L(2nD)
The HS Algorithm (2) • For WDNF to work, and work efficiently, two requirements must be met: • An oracle for the distribution must be provided for the learner. • The distribution must be polynomially-near uniform. • We show how to simulate an approximate oracle Di’ that can be provided to the weak learner instead of an exact one. • We then show that the distributions Di are in fact polynomially-near uniform.
Simulating Di(1) • Define: • To provide an exact oracle we need to compute the denominator which could potentially take an exponentially long time. • Instead we will estimate the value of using AMEAN.
Simulating Di(2) • .
Implications of Using Di’ • Note that: • gi’ = 2n f Di’ = 2nf ci Di = ci gi • Multiplying the distribution oracle by a constant is like multiplying all the coefficients of gi by the same constant. • The relative sizes of the coefficients stay the same. • WDNF will be able to find the large coefficients. • The running time is not adversely affected.
Bound on Distributions Di • It can be shown that for each i: • Thus Di is bounded by a polynomial in L(D) and ε-1. • If is D polynomially-near uniform then Di is also polynomially-near. • HS strongly learns DNF with respect to the uniform distribution.
Summary • DNF can be weakly learned with respect to polynomially-near distributions using the WDNF algorithm. • The HS algorithm strongly learns DNF with respect to the uniform distribution by boosting the WDNF weak learner.