240 likes | 288 Views
Extremal properties of polynomial threshold functions. Ryan O’Donnell (MIT / IAS) Rocco Servedio (Columbia). Representing boolean functions. Complexity theory studies dozens of different representations for boolean functions:
E N D
Extremal properties of polynomial threshold functions Ryan O’Donnell (MIT / IAS) Rocco Servedio (Columbia)
Representing boolean functions Complexity theory studies dozens of different representations for boolean functions: • circuits: boolean, algebraic, threshold; formulas, low-depth variants • decision trees • branching programs • switching networks • polynomials over various fields • monotone span programs • contact networks
Extremal bounds For each representation, one can ask, “What is the “size” of the hardest boolean function, or of a random function?” Often a fairly easy problem: upper bound by “trivial” construction, lower bound by counting. E.g., for circuit size: • [Lupanov-58] Every function has a circuit of size (1+o(1))2n/n. • [Shannon-49] Almost every function requires circuits of size 2n/n.
Polynomial Threshold Functions Let f : {+1,-1}n→ {+1,-1} be a boolean fcn. Let p : Rn →R a multilinear polynomial. We say that p is a polynomial threshold function (PTF) for f, or p sign-represents f, if: f(x) = sgn(p(x)) for all x{+1,-1}n. • See the excellent survey “Slicing the hypercube” [Saks-93]. • PTFs correspond to the circuit class Threshold-Of-Parities.
PTF examples • AND: x1 + x2 + · · · + xn + (n-1) • OR: x1 + x2 + · · · + xn – (n-1) • Majority: x1 + x2 + · · · + xn • Parity: x1 x2 · · · xn • (x1x2) x3: 100 x1 x2 + x3– 100 There are two main size measures for PTFs:degree – number of vbls. in biggest monomial (between 0 and n)density – number of monomials (between 1 and 2n)
Why PTFs? • natural algebraic model of complexity • degree upper bounds: machine learning algorithms [Klivans-S-01, O-S-03] PP closed under intersection [Beigel-Reingold-Spielman-95] • simultaneous degree/density lower bounds: oracle separations (e.g., PNP ≠ PPA, [Beigel-94]) • degree lower bounds: quantum decision tree lower bounds A
The PTF extremal problem Also, the PTF extremal problem is interesting! • Are there functions that require PTF degree n? • Do most functions have PTF degree << n? • Does every function have PTF density somewhat smaller than 2n? • Are there functions that require PTF density close to 2n?
Results in this talk In this talk I will discuss two of our results: • Degree upper bound: Almost every boolean function has PTF degree at most n/2 + O(√n log n). • Density upper bound: Every boolean function has PTF density at most (1 – O(n)) 2n. 1
Results not in this talk def: We say p is a weak PTF for f if, for all x{+1,-1}n, either p(x) = 0 or sgn(p(x)) = f(x). (Also, p is not allowed to be identically 0!) Saks asked whether almost all functions require weak PTF density (½ - ε) 2n. In fact, we show everyfunction has weak PTF density o(1)2n (Ramsay theory). We show a couple other bounds…
Degree bounds: previous results • [Minsky-Papert-68]: Parity and its negation require PTF degree n. [Aspnes-Beigel-Furst-Rudich-94] show these are the only such functions. • [Wang-Williams-91], [ABFR-94]:Conjecture:almost every function has PTF degree n/2 or n/2. • Lower bound of n/2 shown by a counting argument [Anthony-92], [Alon-93] based on a result of [Cover-65].
Progress on the upper bound Towards the upper bound: • [Razborov-Rudich-94] showed almost every function has PTF degree .95 n. • [Alon-93] observed that the work of [Gotsman-89] implies a PTF degree upper bound of .89 n. We show the conjecture is true up to lower-order terms: Thm: Almost every function has PTF degree n/2 + O(√n log n).
Fourier detour It’s known that any function f : {+1,-1}n→ R can be exactly represented as f(x) = Σ f (S) xS, where the f (S)’s are real constants, and the monomial xS is Πxi. This is known as theFourier representation. Parseval’s identity: Σ f (S)2 = Σ f(x)2 / 2n. S [n] iS S [n] x{+1,-1}n
Our degree upper bound We actually show a stronger fact: Thm: Let Sbe any collection of (1-1/n)2n monomials. Then a.e. function has a PTF over these monomials. Cor: Almost every function has a PTF of degree n/2 + O(√n log n). Proof: For each z{+1,-1}n, let δz : {+1,-1}n→ R be the “Dirac delta function,” δz(z) = 2n, δz(x) = 0 for x ≠ z.
Proof sketch continued Random ±2nfunctions are made by formingΣ f(z) δz(x), where f(z)’s are coin tosses. The function δz(x) has a simple Fourier representation: δz(x) =Σ zSxS. Suppose we “approximate” each δzby deleting the summands outside S : δ'z(x) = Σ zSxS. z{+1,-1}n S [n] S S
2n δz(·) 0 z {+1,-1}n δz(x) =+1 +x1 -x2+x3 -x1x2+x1x3-x2x3+ · · · 2n δ'z(·) |S| ±(1/n) 2n {+1,-1}n z δz(x) =+1 +x1 -x2+x3 -x1x2+x1x3-x2x3+ · · ·
Proof sketch continued We want to show that for any particular x, w.v.h.p, Σ f(z) δz(x) and Σ f(z) δ'z(x) have the same sign. (Then union bd. over x.) Taking the z = x summand starts the sum off with |S| f(x) = (1-1/n)2nf(x) – good shape so far. You get noise terms for all other z. But…! Key point: These are summed with random ±signs, so they get “dampened”. z{+1,-1}n z{+1,-1}n
Proof sketch completed To show that a random ± sum of quantities – {δ'z(x): z ≠ x} – is small w.h.p., the key is to show a) the #’s are bounded, and b) the sum of squares (variance) is small. Both come easily: each # is at most (1/n)2n (in abs. val.), and the sum of the squares is easily calculated exactly using Parseval’s equation: independently of x, it’s equal to (1/n–1/n2) 22n. SD≈(1/√n)2n. Hence Hoeffding |error| < .5 2n w.v.h.p.
Density bounds: previous results • [Gotsman-89] showed that every boolean function has PTF density at most 2n – 2n/2. • [Saks-93] observed that [Cover-65] implies that almost every boolean function requires PTF density at least .11 2n. • Our thm: Every boolean function has PTF density at most (1-1/O(n)) 2n. • We get to omit a 1/O(n) fraction of monomials, compared to [G89]’s 1/2n/2.
Proof sketch: density upper bound Let f : {+1,-1}n→ {+1,-1} be any boolean fcn. Let: L1(f) = Σ |f (S)|. Since Σ f (S)2 = Σ f (S)2 = 1 (Parseval), by Cauchy-Schwarz, L1(f) ≤ 2n/2. [Bruck-Smolensky-92] shows that f always has a PTF of density 2nL1(f)2. So we’re already done unless, say,L1(f) ≥ (1/n) 2n/2. S [n] S [n] S [n]
Proof sketch continued If L1(f) is very close to its upper bound, 2n/2, then its coefficients must be very “spread out”: a handful may be “large,” but almost all must be close to 2-n/2. Recall: f(x) = Σ f (S) xS. Let L be the set of coefficients that are “small.” Fix x. We show that if you omit a random selection of (1/O(n)) 2n terms from L, the sum of what you omit is smaller than 1 w.p. 1 – 2-2n. S [n]
Proof sketch: completed f(x) = Σ f (S) xS We’re adding up N ≈ 2n numbers, f (S) xS. Each is not much more than ±2-n/2 = ±1/√N. Their mean is very small – around ±log(N)/N: Had we summed over all S we would have gotten f(x) =±1; we omitted few terms. Hence (Hoeffding) if we sum a random subset of size N/log(N), the result has magnitude at most 1 w.p. at most 1/N2. S L
Open problems For the problem of degree, the conjecture of Wang & Williams and ABFR is still open: Is the PTF degree of almost every function as low as n/2? For the problem of density, we’re not even sure where the right answer lies: .11 2n … (1-1/O(n)) 2n. Our conjecture: Almost every function has PTF density .5 2n.