1 / 35

Lower Bounds for Property Testing

Lower Bounds for Property Testing. Luca Trevisan U C Berkeley. Sub-linear Time Algorithms. Want to design algorithms that run in less than linear time cannot read entire input must be probabilistic and approximate For optimization problems:

monet
Download Presentation

Lower Bounds for Property Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lower Bounds for Property Testing Luca Trevisan U C Berkeley

  2. Sub-linear Time Algorithms • Want to design algorithms that run in less than linear time • cannot read entire input • must be probabilistic and approximate • For optimization problems: • compute numerical apx of optimum cost (and implicit representation of apx solution?) • For decision problems: • what is approximation?

  3. Graph Property Testing [GGR] Testing a property P with accuracy e • Given graph G that has property P • accept with probability >3/4 • Given graph G that is e-far from property P • accept with probability <1/4 e-far = must change e–fraction of representation of G to get property P Intuition: input (not output) is approximate

  4. Different Representations • G is represented as adjacency matrix • e-far = must add/remove en2 edges • G has max degree d and is represented using adjacency lists • e-far = must add/remove edn edges (Some extra subtleties in bounded-degree case)

  5. Purpose of This Talk • Discuss algorithms and lower bounds for • Sub-linear time property testing for some basic graph properties • Sub-linear time approximation algorithms for some basic optimization problems (we’ll mostly discuss lower bounds)

  6. Motivations • Large data sets • web, wall-mart, amazon, phone calls, . . . • linear time can still be infeasible Fine print: most research on property testing focuses on problems having no connection to applications with large data sets • Goal for theory research • Develop general algorithmic techniques(like dynamic programming, local search, … for P) • Develop general techniques for impossibility results(like NP-completeness)

  7. Property Testing and Approximation inAdjacency Matrix Representation

  8. Bipartiteness Algorithm [GGR,AK] Testing bipartiteness of a given graph G • Pick (1/e)polylog(1/e) vertices, and check if they induce a bipartite graph; if so accept otherwise reject • If G is bipartite then alg accepts with prob 1 • If G is e-far from bipartite, then whp algorithm discovers an odd cycle (non-trivial to prove) • Running time: O ((1/e2)polylog(1/e))

  9. Lower Bounds [BT] • W(1/e1.5) for adaptive algorithms • W(1/e2) for non-adaptive algorithms • The bounds apply to the ‘query complexity’ of the algorithm(and to running time for a stronger reason)

  10. Proof for one-sided error case • Pick a random graph with edge-probability 3e • whp it is e-far from bipartite • Consider view of (possibly adaptive) algorithm that makes q ‘queries’ and finds odd cycle w.h.p. • sees Q(eq) edges and O(e2q2) pairs of connected vertices • a cycle can be discovered only by querying two vertices in same connected component • it takes W(1/e) such attempts • q= W (1/e1.5 )

  11. One-sided error non-adaptive • Pick a random graph with edge-probability 3e • Consider view of non-adaptive algorithm that makes q ‘queries’ • Same as: • Start with q-edges graph • Independently delete each edge with prob 1-e • If q=o(1/e2) then view is a forest w.p. 1-o(1) • Proof: There are at most O(qt/2) cycles of length t

  12. Two-Sided Error • Two distributions: • Gfar: random graph with edge probability 3e • Gbip: first random partition, then each edge crossing partition exists with prob 6e • Distributions indistinguishable by • Non-adaptive algorithms of query complexity o(1/e2) • Adaptive algorithms of query complexity o(1/e1.5) Both tight for these distributions

  13. Generality/Lessons • Possible lesson: try random graph as a possible distribution of ‘hard’ instances far from having the properties • Not good for “Triangle freeness” property whose complexity is possibly most interesting open question in the adjacency matrix model.

  14. Triangle-free Graphs • Want to distinguish triangle-free graphs from graphs where need to remove en2 edges to break all triangles • Solvable in time super-exponential in 1/e • Polynomial in 1/e is impossible [Alon] • 2poly(1/e) possible? • Simplest special case of more general (and important) question

  15. Sublinear Time Approximation • Max CUT and other graph problems can be approximated within (1+e) in graphs with at least an2 edges in time 2poly(1/ea) [GGR] • Max 3SAT can be approximated within (1+e) in instances with at least an3 clauses in time 2poly(1/ea) and similar results for other satisfiability problems [AFKK] • Lower bounds?

  16. Property Testing and Approximation in Adjacency List Representation

  17. Bipartiteness [GR] Testing bipartiteness • Repeat polylog n times: • Start at random point, and pick sqrt(n) random walks of length polylog n, if two of them combine to form an odd cycle reject, otherwise accept • Analysis: • in a graph where you need to remove constant fraction of edges to make it bipartite, algorithm finds odd cycle

  18. Matching Lower Bound [GR] • Define two distributions of graphs: • Gfar: a random hamiltonian circuit, plus a random matching(whp 1/100-far from bipartite) • Gbip: a random hamiltonian circuit, plus a random matching conditioned on making the graph bipartite • Gfar and Gbip are indistinguishable to algorithms of query complexity o(sqrt(n)).

  19. Approximation Algorithms • Minimum spanning tree • given a connected weighted graph of degree d with weights in range {1,…,w}, can approximate MST weight within (1+e) in time about O(dw/e2)[Chazelle, Rubinfeld, T] • Max SAT • Given a CNF where every variable occurs at most d times, can approximate Max SAT optimum within .618, presumably also 2/3, in O(d) time[Hopefully will get 3/4-d]

  20. Testing 3-Colorability • NP-hard in adjacency list representation • Only for small enough e • Can find 3-coloring good for 80% of the edges in a 3-colorable graph using SDP • NP-hard to find 3-coloring good for 98% (?) fraction of edges • Gives non-tight, and conditional lower bound for query complexity

  21. Other Problems • Query complexity of following problems is ‘equivalent’ to query complexity of testing 3col • Testing satisfiability of 3SAT instance • Every variable occurs in O(1) clauses, “adjacency list” representation • Approximating max cut, vertex cover, independent set, . . ., in bounded-degree graphs • Approximating Max SAT, Max 2SAT, . . . • Lower bound of sqrt(n) for all problems • Reduction from bipartiteness

  22. Tight Lower Bound [BOT] • For one-sided error algorithms: • W(n) query complexity to distinguish 3-colorable graphs from graphs that are (1/3 – d)-far • Lower bound applies to testing problems that are solvable in polynomial time • For two-sided error algorithms: • For some e, W(n) query complexity to distinguish 3-colorable graphs from graphs that are e-far.

  23. Using Reductions. . . • Unconditionally, algorithms running in time o(n) cannot: • Approximate Max 3SAT better than 7/8 • Approximate Max Cut in bounded-degree graphs better than 16/17 • . . . • Hastad’97 proved above problems are NP-hard

  24. The 3-Coloring Lower Bound • Consider first one-sided error algorithms • It’s enough to find a graph G that is (1/3 – d)-far from 3-colorable, but every subgraph of size < an is 3-colorable • (for every d there is an asuch that . . .) • Then an algorithm of query complexity < an either accepts G (which is wrong) or rejects some 3-colorable graph (which means the algorithm has not one-sided error)

  25. The Graph • Pick a graph of degree O(1/d2) at random (pick so many random matchings) • Then it is (1/3 – d)-far whp • But, for some a, whp, every subgraph induced by k < an vertices contains <1.5k edges • In a minimal non-3-colorable graph, every vertex has degree at least 3 • Every subgraph induced by < an vertices is 3-colorable [Erdos]

  26. Derandomization • For constants d, e, a, and for every suff large n, we can explicitly construct a graph • on n vertices, • max degree d, • e-far from 3-colorable, • such that every subset of an vertices induces a 3-colorable subgraph.

  27. Two-Sided Error Algorithms • Need to define two distributions of graphs Gcol and Gfar such that • Graphs in Gcol are (almost) always 3-colorable • Graphs in Gfar are (almost) always far from 3-colorable • To an algorithm of bounded query complexity, Gcol and Gfar look (almost) the same

  28. Main Step • Define two distributions Dsat and Dfar of instances of E3LIN-2(systems over GF(2) with 3 variables per equation) • Systems in Dsat are always satisfiable • Systems in Dfar are (almost) always (1/2-d)-far from satisfiable • To an algorithm of bounded query complexity, Dsat and Dfar look the same • We get Gcol and Gfar using reduction fromapproximate E3LIN-2 to approximate 3-coloring

  29. E3LIN-2 X1 + X3 + X10 = 0 mod 2 X2 + X3 + X4 = 1 mod 2 X1 + X2 + X9 = 0 mod 2 . . .

  30. Main Building Block • We show that for every c there is a such that there exists a left-hand side with • n variables, cn equations, 3 variables per equations, every variable occurs in 3c equations • every an equations are linearly independent • Pick the left-hand side at random • repeat 3c times: pick at random a set of n/3 disjoint triples of variables • Explicit construction? • Need strong unique-neighbor expanders

  31. Distributions • The left-hand side is always as before • In Dsat, we pick a random assignment to the variables, and set right-hand side consistently • always satisfiable • In Dfar, we pick the right-hand side uniformly at random • With high probability, (1/2 – O(1/sqrt c))-far

  32. Indistinguishability • Two distributions differ only in right-hand side • In Dfar uniformly distributed • In Dsat, an-wise independent • Linear independence implies statistical independence • Look the same to algorithm that sees less than an equations

  33. Conclusion of the Argument • No algorithm of “query complexity” o(n) can distinguish satisfiable instances of E3LIN-2 from instances that are (1/2-d)-far from satisfiable • For some e, no algorithm of query complexity o(n) can distinguish 3-colorable graphs from graphs that e–far from 3-col. • No algorithm of query complexity o(n) can approximate Max 3SAT better than 7/8 . . .

  34. Generality/Lessons • Reductions are useful and extend results to several problems • In adjacency matrix (dense graph) setting, several and general algorithms. Few and ad-hoc lower bounds • In adjacency list (sparse graph) setting, vice versa.

  35. Open Questions • Show that distinguishing 3-colorable graphs from (1/3-d)-far graphs requires query complexity W(n) • we can only prove it for one-sidederror • Show that approximating Max SAT better than ¾ and Max CUT bettter than½ requires query complexity W(n) • we only know W(sqrt(n)) [implicit in GR] • would “explain” why we need SDP

More Related