1 / 36

Lower Bounds for Property Testing

Lower Bounds for Property Testing. Luca Trevisan U.C. Berkeley Joint work with Andrej Bogdanov and Kenji Obata. Sub-linear Time Algorithms. Want to design algorithms that run in less than linear time (and so cannot read entire input). Must be probabilistic and approximate

Download Presentation

Lower Bounds for Property Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lower Bounds for Property Testing Luca Trevisan U.C. Berkeley Joint work with Andrej Bogdanov and Kenji Obata

  2. Sub-linear Time Algorithms • Want to design algorithms that run in less than linear time (and so cannot read entire input). • Must be probabilistic and approximate • For optimization problems: • Compute numerical apx of optimum cost (and implicit representation of apx solution?) • For decision problems: • What is approximation for decision problems?

  3. (Graph) Property Testing Testing a property P with accuracy ein adjacency matrix representation: • Given graph G that has property P, accept with probability >3/4 • Given graph G that is e-far from property P accept with probability <1/4 • e-far = must change e–fraction of adjacency matrix to get property P(add/remove > en2edges)

  4. Example [GGR,AK] Testing bipartiteness of a given graph G • Pick (1/e)polylog(1/e) vertices, and check if they induce a bipartite graph; if so accept otherwise reject • If G is bipartite then alg accepts with prob 1 • If G is e-far from bipartite, then whp algorithm discovers an odd cycle (non-trivial to prove) • Running time: O ((1/e2)polylog(1/e)) • We will discuss matching lower bound if time allows

  5. Paleontologist’s approach

  6. Bounded Degree Graphs Testing a property P with accuracy ein adjacency lists representation: • Given graph G that has property P, accept with probability >3/4 • Given graph G that is e-far from property P accept with probability <1/4 • e-far = must change e–fraction of adjacency lists entries to get property P(add/remove > ednedges)

  7. Bipartiteness [GR] Testing bipartiteness • Repeat polylog n times: • Start at random point, and pick sqrt(n) random walks of length polylog n, if two of them combine to form an odd cycle reject, otherwise accept • Analysis: • in a graph where you need to remove constant fraction of edges to make it bipartite, algorithm finds odd cycle

  8. Matching Lower Bound [GR] • Define two distributions of graphs: • Gfar: a random hamiltonian circuit, plus a random matching(whp 1/100-far from bipartite) • Gbip: a random hamiltonian circuit, plus a random matching conditioned on making the graph bipartite • Gfar and Gbip are indistinguishable to algorithms of query complexity o(sqrt(n)).

  9. Sub-linear Time Approximation • Minimum spanning tree • given a connected weighted graph of degree d with weights in range {1,…,w}, can approximate MST weight within (1+e) in time about O(dw/e2)[Chazelle, Rubinfeld, T] • Max SAT • Given a CNF where every variable occurs at most d times, can approximate Max SAT optimum within .618, presumably also 2/3, in O(d) time[work in progress, hopefully will get 3/4-d]

  10. Sublinear Time Approximation • Problems restricted to dense instances: • Max CUT and other graph problems can be approximated within (1+e) in graphs with at least an2 edges in time 2poly(1/ea)[GGR] • Max 3SAT can be approximated within (1+e) in instances with at least an3 clauses in time 2poly(1/ea) and similar results for other satisfiability problems[AFKK]

  11. General Goals • When looking for polynomial-time algorithms: • Several algorithmic techniques of general applicability • A general technique to “prove” impossibility (NP-completeness) • For sublinear-time algorithms: • General algorithmic techniques? • Impossibility results?

  12. Testing 3-Colorability • Easy in adjacency matrix representation • NP-hard in adjacency list representation • Only for small enough e • Can find 3-coloring good for 80% of the edges in a 3-colorable graph using SDP • NP-hard to find 3-coloring good for 98% (?) fraction of edges • Non-tight, and conditional lower bound for query complexity

  13. Other problems • The query complexity of following problems is equivalent to query complexity of testing 3col • Testing satisfiability of 3SAT instance • Every variable occurs in O(1) clauses, “adjacency list” representation • Approximating max cut, vertex cover, independentset, . . ., in bounded-degree graphs • Approximating Max SAT, Max 2SAT, . . . • Lower bound of sqrt(n) for all problems • Nothing better except with complexity assumptions

  14. Our Results • For one-sided error algorithms: • W(n) query complexity to distinguish 3-colorable graphs from graphs that are (1/3 – d)-far • Lower bound applies to testing problems that are solvable in polynomial time • For two-sided error algorithms: • For some e, W(n) query complexity to distinguish 3-colorable graphs from graphs that are e-far.

  15. Additional Results • Unconditionally, algorithms running in time o(n) cannot: • Approximate Max 3SAT better than 7/8 • Approximate Max Cut in bounded-degree graphs better than 16/17 • . . . • Hastad’97 proved above problems are NP-hard

  16. The 3-Coloring Lower Bound • Consider first one-sided error algorithms • It’s enough to find a graph G that is (1/3 – d)-far from 3-colorable, but every subgraph of size < an is 3-colorable • (for every d there is an asuch that . . .) • Then an algorithm of query complexity < an either accepts G (which is wrong) or rejects some 3-colorable graph (which means the algorithm has not one-sided error)

  17. The Graph • Pick a graph of degree O(1/d2) at random (pick so many random matchings) • Then it is (1/3 – d)-far whp • But, for some a, whp, every subgraph induced by k < an vertices contains <1.5k edges • In a minimal non-3-colorable graph, every vertex has degree at least 3 • Every subgraph induced by < an vertices is 3-colorable [Erdos]

  18. Explicit Construction • Can the previous construction be derandomized? • For constants d, e, a, and for every suff large n, we can explicitly construct a graph on n vertices, max degree d, e-far from 3-colorable, and such that every subset of an vertices induces a 3-colorable subgraph.

  19. Explicit Construction • We construct a 3SAT formula such that for constants k, e’, a’ • Every variable occurs k times • No assignment satisfies more than 1-e’ fraction of clauses • Every a’ fraction of clauses is satisfiable • Then we use (slightly new) reduction from 3SAT to 3Coloring

  20. The Formula • Fix a degree-d expander graph G=(V,E) such that for every cut (S,V-S) at least min{|S|,|V-S|} edges cross the cut(enough d=14) • Have two variables xuv and xvu for each egde (u,v) • For every vertex v have the (3SAT equivalent of) the constraint • Su xuv = 1 + Sw xvw

  21. Structure of the Analysis • Impossible to satisfy more than a fraction 1/(d+1) of the constraints • Can always satisfy half of the constraint • define an auxiliary network • show that the auxiliary network has no smallcut because of expansion • then there is a large flow • use large flow to find assignment for subset of constraint

  22. Flow Argument • Want to satisfy constraints corresponding to vertices in C, with |C| < |V|/2 Construct flow network with new source s, sink t obtained by collapsing V-C, and vertices in C V-C s t C

  23. Flow Argument |A| edges A t • Every cut has size at least |C| • There is a 0/1 flow of cost at least |C| • Interpreted as an assignment, satisfies all constraints in C s |C-A| edges C-A

  24. Two-Sided Error Algorithms • Need to define two distributions of graphs Gcol and Gfar such that • Graphs in Gcol are (almost) always 3-colorable • Graphs in Gfar are (almost) always far from 3-colorable • To an algorithm of bounded query complexity, Gcol and Gfar look (almost) the same

  25. Main Step • Define two distributions Dsat and Dfar of instances of E3LIN-2(systems over GF(2) with 3 variables per equation) • Systems in Dsat are always satisfiable • Systems in Dfar are (almost) always (1/2-d)-far from satisfiable • To an algorithm of bounded query complexity, Dsat and Dfarlook the same • We get Gcol and Gfar using reduction fromapproximate E3LIN-2 to approximate 3-coloring

  26. E3LIN-2 X1 + X3 + X10 = 0 mod 2 X2 + X3 + X4 = 1 mod 2 X1 + X2 + X9 = 0 mod 2 . . .

  27. Main Building Block • We show that for every c there is a such that there exists a left-hand side with • n variables, cn equations, 3 variables per equations, every variable occurs in 3c equations • every an equations are linearly independent • Pick the left-hand side at random • repeat 3c times: pick at random a set of n/3 disjoint triples of variables • Explicit construction?

  28. Distributions • The left-hand side is always as before • In Dsat, we pick a random assignment to the variables, and set right-hand side consistently • always satisfiable • In Dfar, we pick the right-hand side uniformly at random • With high probability, (1/2 – O(1/sqrt c))-far

  29. Indistinguishability • Two distributions differ only in right-hand side • In Dfar uniformly distributed • In Dsat, an-wise independent • Linear independence implies statistical independence • Look the same to algorithm that sees less than an equations

  30. Conclusion of the Argument • No algorithm of “query complexity” o(n) can distinguish satisfiable instances of E3LIN-2 from instances that are (1/2-d)-far from satisfiable • For some e, no algorithm of query complexity o(n) can distinguish 3-colorable graphs from graphs that e–far from 3-col. • No algorithm of query complexity o(n) can approximate Max 3SAT better than 7/8 . . .

  31. Open Questions • Show that distinguishing 3-colorable graphs from (1/3-d)-far graphs requires query complexity W(n) • we can only prove it for one-sided error • Show that approximating Max SAT better than ¾ and Max CUT bettter than ½ requires query complexity W(n) • we only know W(sqrt(n)) [implicit in GR] • would “explain” why we need SDP

  32. Back to Dense Graphs • Recall Alon-Krivelevich bipartiteness test for the adjacency matrix representation: • pick (1/e)polylog(1/e) vertices and look at induced subgraph • if see odd cycle reject, otherwise accept • Running time (1/e2)polylog(1/e) • We prove: • W(1/e2) for non-adaptive algorithms • W(1/e1.5) for adaptive algorithms

  33. Two Distributions • Gfar: every edge exists with probability e • whp it is e/3-far from bipartite • Gbip: pick a random partition, then every edge that crosses the partition exists with probability 2e • Thm1: look the same to non-adaptive algorithms making o(1/e2) queries • Thm2: look the same to adaptive algorithms making o(1/e1.5) queries

  34. Proof of a Weaker Statement • Thm1 (weaker): a non-adaptive algorithm making q=o(1/e2) queries in Gfar is unlikely to see an odd cycle • Proof: • a non-adaptive algorithm asks about some subgraph with q edges. • There are at most about qt/2 cycles of length t, and each one exists with probability etqt/2, exponentially small in t. • Summing over all t, it’s still unlikely that there is a cycle

  35. Proof of a Weaker Statement • Thm2 (weaker): an adaptive algorithm making q=o(1/e1.5) queries in Gfar is unlikely to see an odd cycle • Proof: • the algorithm sees an edge only once in 1/e queries • the algorithm sees a cycle only after querying a pair that it already sees as connects • It takes 1/e.5 edges to have 1/e pairs of connected vertices • It takes 1/e1.5 queries to have so many edges

  36. Some more open questions • In adjacency matrix representation, most interesting problems solvable in constant (in e) time • For some problems (eg testing triangle-freeness) analysis uses Szemeredy’s regularity lemma, and constant is hyper-exponential in e • Lower bound (1/e)log 1/ e and only and for one-sided error • Alternative analysis / stronger lower bounds?

More Related