1 / 39

New degree bounds for polynomials with prescribed signs

Explore the problem of finding the lowest degree polynomial that matches prescribed signs on disjoint regions in multi-dimensional space. Learn about Polynomial Threshold Functions (PTFs) and their applications in complexity theory, machine learning, and quantum decision trees. Discover upper and lower bounds for PTFs and how they relate to Boolean functions and algebraic complexity models.

rowles
Download Presentation

New degree bounds for polynomials with prescribed signs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. New degree bounds for polynomials with prescribed signs Ryan O’Donnell (MIT) Rocco Servedio (Harvard/Columbia)

  2. Polynomials with prescribed signs Suppose m disjoint regions R1, …, Rm are given in Rn, along with associated signs, σ1, …, σm. What is the lowest degree polynomial p :Rn→R which has the prescribed signs on the regions? In one dimension the problem is trivial: if the regions are intervals, the number of sign alternations is necessary and sufficient. In two or more dimensions…??

  3. Polynomial threshold functions A special case: Let f :{0,1}n→{+1,−1}be a boolean function. Let p :Rn→Rbe a polynomial. We say that p is a polynomial threshold function (PTF) for f, or psign-representsf, if: f(x) = sgn(p(x)) for all x{0,1}n. We are concerned with finding the lowest degree PTF for f.

  4. Polynomial threshold functions For example: • x1+x2+…+xn− ½ deg 1 PTF for OR • x1+x2+…+xn− (n−½) deg 1 PTF for AND • x1+x2+…+xn− (n/2) deg 1 PTF for MAJ • (1−2x1)(1−2x2)···(1−2xn) deg n for PARITY Every n-bit boolean function has a PTF (indeed, an exact rep.) of degree ≤n. (Consider: … + f (1101) x1x2(x3−1)x4 + …)

  5. Polynomial threshold functions What are PTFs good for? • natural algebraic model of complexity • upper bounds  machine learning: given a class of functions C, if every function has a PTF of degree d, can learn C in time nO(d) • used to prove PP closed under intersection • lower bounds  oracle separations • slightly stricter model related to quantum decision tree complexity

  6. Prior work — lower bounds Minsky & Papert, Perceptrons, 1968: • artificial intelligence perspective • proved three major lower bounds: • PARITY requires PTF degree n • a certain DNF formula, “one in a box”, the n1/3­way OR of n2/3­way ANDs, requires PTF degree n1/3 • MAJ(x1,…,xn) ANDMAJ(y1,…,yn) requires superconstant PTF degree No new, essentially diff., lower bounds known.

  7. Prior work — upper bounds • [BRS95] considered AND-MAJn as well; they showed it has PTF degree O(log n); they used this to show PP is closed under intersection • [KS01] showed that every DNF formula on n variables with s terms has a PTF of degree O(n1/3 log s); they use this to get a subexponential time learning algorithm for DNF formulas which is fastest known

  8. Our results Upper bound: every boolean function given by an AND/OR/NOT formula of size s and depth d has a PTF of degree √slogO(d)s (note that degree s is trivial)  gives a subexponential time learning algorithm for, say, linear size formulas of superconstant depth, first such known Lower bound: new technique  AND-MAJn requires PTF degree Ω(log n / log log n).

  9. Talk outline Plan for the talk: 1. Prove √slogO(d)sPTFupper bound for formulas. 2. Prove Ω(log n / log log n) PTF lower bound for AND-MAJn.

  10. AND OR Boolean formulas • a formula is a tree whose gates are ANDs or ORs, unbounded fan­in • leaves are labeled with literals • size is number of leaves • depth is longest root­to­leaf path OR OR x1 x5 AND AND AND AND x1 x2 x3 x4 x2 x7 OR OR OR OR x8 x9 x10 x11 x12 x1 x6 x7 x13 x4

  11. PTFs for boolean formulas (In this section we use {0,1} always.) Idea: replace all gates with low degree polynomials which simulate the gate: AND(v1,…vk)? • v1 + … + vk− (k−1) • [(v1+ … + vk) / k]k log(1/ε) 0 AND  1 0 1 1 0 

  12. A better amplifying polynomial We want to amplify the disparity between 1−1/k and 1. Raising to the power of k works, but costs a lot of degree. We desire a polynomial of low degree which keeps values in [0, 1−1/k] between 0 and 1 but amplifies the point 1 to, say, 2. Equivalently, want to get a polynomial bounded on [0,1], with maximum derivative at 1.

  13. Chebyshev polynomials This is an old problem of analysis, solved by the Chebyshevpolynomials of the first kind. These are a family of orthogonalpolynomials, (Cr)rN, with theproperties:deg(Cr) = r,Cr([-1,1])  [-1,1] ,Cr'(1) = r2, Cr(1+1/r2)2. Cr(x) = cos(r acos(x)).

  14. Chebyshev polynomials at gates Chebyshev polynomials give us a square-root degree savings: Imagine replacing AND(v1,…vk) with: C√k ([(v1+ … + vk) / (k-1)]).(*) (v1+ … + vk) / (k-1)  1+1/k if all vi’s are roughly 1, and is in [0,1] otherwise. Hence (*) is something like 2 when the AND is true, and is between -1 and 1 otherwise. (This idea is originally from [KS01].)

  15. Chebyshev polynomials at gates In fact, we will replace each AND gate by: εC√k ([(v1+ … + vk) / (k-1)]) log(1/ε), and something similar for OR gates. Note that if the inputs have 0/1 values ε, so do the outputs. Further, if the vi’sall have degree bounded by d, the resulting polynomial has degree bounded by d√klog(1/ε).

  16. Almost done By applying these polynomials at every gate, we can easily conclude: Suppose F is a formula in which along every path from root to leaf, the product of the fan-ins is t. Then we can sign-represent F with a polynomial of degree √tlogO(d)s. (Need to take ε 1/s.) We are not quite done, because these fan-in products can be huge!

  17. AND AND OR OR Bounding fan-in products x1 … xn/100 xn/100 … x2n/100 … … … Only n variables (leaves) are used, but one path has fan-in product (n/100)100.

  18. AND AND AND AND AND Solution: bucket The trick is now to partition each gate into gates, each of which has subformulas of similar size: log s s3 s1 s2 s4 1≤ si<2 2j≤ si<2j+1 s/2≤ si< s

  19. Conclusion of upper bound Now it is easy to see that gates with a subformula of depth d and size s have maximum root-to-leaf fan-in product of O(s logds): Pf: By induction: the AND bucket with subsizes in [2j, 2j+1] has fan-in at most s/2j. Hence if we first modify our formulas in this way, and then apply the Chebyshev construction, we get PTFs of degree √slogO(d)s, as desired.

  20. Talk outline Plan for the talk: 1. Prove √slogO(d)sPTFupper bound for formulas.  2. Prove Ω(log n / log log n) PTF lower bound for AND-MAJn. 

  21. Lower bound for AND-MAJn Recall the AND-MAJn function: (x1,…,xn, y1,…,yn)↦ MAJ(x1,…,xn)ANDMAJ(y1,…,yn). Minsky and Papert (1968) showed that any PTF required superconstant ω(1) degree. Beigel, Reingold, and Spielman (1995) exhibited a PTF of degree O(log n). We give a new lower bound of: Ω(log n / log log n).

  22. The two-dimensional problem Minsky and Papert observed that the problem of PTFs for AND-MAJn is equivalent to a much simpler polynomial sign prescription problem – the M-intersector problem: • R2, bivariate polynomial • regions: all odd latticepoints bounded by M • upper­right points positive, others negative y x M

  23. Proof of equivalence Switch to {+1,−1} in input and output. () Suppose p is an n-intersector. Then p(∑xi, ∑yi) is a PTF for AND-MAJn of same degree. () Suppose p is the PTF. Consider: q(x1…xn, y1…yn) = ∑ p(xπ(1)…xπ(n), yπ'(1)…yπ'(n)). By symmetry, q is also a PTF for AND-MAJn. But q is symmetric in x’s and y’s, hence depends only on their sum, q=q(∑xi, ∑yi). π,π'Sn

  24. The M-intersector problem Consider the more general sign prescription problem: No polynomial can havethese signs! Proof: Assume we havep of minimal degree. By continuity, p must be 0on x half-axis. By Bezout, x | p. Divide through; the result has smaller degree, solves (essentially) same problem. y − + x − −

  25. Reproving Minsky-Papert This can be used to show Minsky and Papert’s superconstant lower bound. Suppose there was a fixed d such that there was a M-intersector of degree d for every M. Take M→∞, rescaling to the unit square. By compactness and continuity, there is a limiting degree-d polynomial whose signs are as on the previous slide, a contradiction.

  26. The relaxed case y M [BRS95] constructed abivariate polynomial ofdegree O(log M) for thesign pattern shown. We now describe howto obtain a lower boundof Ω(log M / log log M) for the M-intersector problem. We show that for any d, there is a subset of lattice points with coordinates at most dO(d) which can’t be done in degree d. − + x 1 − −

  27. A constructive solution It is possible to show PTF lower bounds constructively. Let Z denote the set of oddlattice points, and let f denotethe function which is +1 inthe upper-right quadrant, −1 elsewhere. Suppose we could find a probability distribution w on Z under which every monomial xiyj, 0 ≤ i+j ≤ d, had zero correlation with f.

  28. A constructive solution I.e., suppose we have w : Z→R≥0, ∑w(z) = 1, such that: ∑ f(x,y) xiyj w(x,y) = 0 for all monomials xiyj of degree at most d. Suppose also that w= 0 on points with coordinates exceeding M. We claim this implies no M-intersector of degree d exists. z Z (x,y) Z

  29. Proof of constructive method Proof: Suppose p were an M-intersector of degree d. On one hand, by linearity of expectation, Ew[f(x,y)p(x,y)] = 0, since f is uncorrelated with monomials of degree ≤ d. On the other hand, on all lattice points bounded by M, f(x,y)p(x,y) > 0. But wgives all of its probability mass to these points. Intriguingly, the much stronger converse (no distribution  PTF) is true, by LP duality.

  30. Constructing the distribution There are D = (d+1)(d+2)/2constraints – monomials we want to be uncorrelated with. Suppose we pick just D+1 points for our distribution to be supported on, (x1,y1), …, (xD+1,yD+1). Then the condition that w is a probability distribution over these points under which all constraint monomials have 0 correlation with fis a (D+1)×(D+1) linear system.

  31. Constructing the distribution 1000: : 0 w(x1,y1) 1 1 1 1 1 · · · · · 1 1 1 w(x2,y2) w(x3,y3) monomialxiyj = f(xk,yk) xki yk j w(xD+1, yD+1 ) point (xk,yk) Our desire is that the solution be nonnegative.

  32. Me thinking

  33. Rocco thinking

  34. Our solution We now pull a rabbit out of our hat and name the exact set of points on which the distribution will be supported. Essentially, we want just the grid of points, but in the log scale. Let h be a large number to benamed later. Our points willbe a subset of {(hi, hj) : 0≤ i+j ≤ d}.

  35. Our solution The exact (D+1) points to consider are: {﴾(−1)l hk, (−1)k hl﴿: 0≤k+l ≤d}{﴾−1,−1﴿}, where h = dO(1), and odd.

  36. Finishing the proof We consider the linear system given by this choice of points. We need to show the solution consists of nonnegative values. The solution weights are ratios of two certain determinants, by Cramer’s rule. Each determinant is a polynomial in h. We calculate the highest order terms, show that they dominate the polynomial (using the fact that h is large), and show they have the same sign. (Details omitted!)

  37. Finishing the proof Hence, we’ve constructed a true probability distribution over the odd lattice points, under which f has zero correlation with all monomials of degree at most d. The largest coordinate used is dO(d). This shows that dO(d)-intersectors require PTF degree d; i.e., M-intersectors require PTF degree Ω(log M / log log M).

  38. Talk outline Plan for the talk: 1. Prove √slogO(d)sPTFupper bound for formulas.  2. Prove Ω(log n / log log n) PTF lower bound for AND-MAJn. 

  39. Open questions • Does every boolean formula of size s have a PTF of degree O(√s) independent of depth? • Minsky and Papert showed a Ω(n1/3) PTF lower bound for a certain depth 2 circuit. Can one show a significantly stronger lower bound for any constant depth circuit? • Better lower or upper bounds for the intersection of two weighted thresholds? • Explore the polynomial sign prescription problem further.

More Related