Lower Bounds for Property Testing

Lower Bounds for Property Testing Luca Trevisan U C Berkeley

Sub-linear Time Algorithms • Want to design algorithms that run in less than linear time • cannot read entire input • must be probabilistic and approximate • For optimization problems: • compute numerical apx of optimum cost (and implicit representation of apx solution?) • For decision problems: • what is approximation?

Graph Property Testing [GGR] Testing a property P with accuracy e • Given graph G that has property P • accept with probability >3/4 • Given graph G that is e-far from property P • accept with probability <1/4 e-far = must change e–fraction of representation of G to get property P Intuition: input (not output) is approximate

Different Representations • G is represented as adjacency matrix • e-far = must add/remove en2 edges • G has max degree d and is represented using adjacency lists • e-far = must add/remove edn edges (Some extra subtleties in bounded-degree case)

Purpose of This Talk • Discuss algorithms and lower bounds for • Sub-linear time property testing for some basic graph properties • Sub-linear time approximation algorithms for some basic optimization problems (we’ll mostly discuss lower bounds)

Motivations • Large data sets • web, wall-mart, amazon, phone calls, . . . • linear time can still be infeasible Fine print: most research on property testing focuses on problems having no connection to applications with large data sets • Goal for theory research • Develop general algorithmic techniques(like dynamic programming, local search, … for P) • Develop general techniques for impossibility results(like NP-completeness)

Property Testing and Approximation inAdjacency Matrix Representation

Bipartiteness Algorithm [GGR,AK] Testing bipartiteness of a given graph G • Pick (1/e)polylog(1/e) vertices, and check if they induce a bipartite graph; if so accept otherwise reject • If G is bipartite then alg accepts with prob 1 • If G is e-far from bipartite, then whp algorithm discovers an odd cycle (non-trivial to prove) • Running time: O ((1/e2)polylog(1/e))

Lower Bounds [BT] • W(1/e1.5) for adaptive algorithms • W(1/e2) for non-adaptive algorithms • The bounds apply to the ‘query complexity’ of the algorithm(and to running time for a stronger reason)

Proof for one-sided error case • Pick a random graph with edge-probability 3e • whp it is e-far from bipartite • Consider view of (possibly adaptive) algorithm that makes q ‘queries’ and finds odd cycle w.h.p. • sees Q(eq) edges and O(e2q2) pairs of connected vertices • a cycle can be discovered only by querying two vertices in same connected component • it takes W(1/e) such attempts • q= W (1/e1.5 )

One-sided error non-adaptive • Pick a random graph with edge-probability 3e • Consider view of non-adaptive algorithm that makes q ‘queries’ • Same as: • Start with q-edges graph • Independently delete each edge with prob 1-e • If q=o(1/e2) then view is a forest w.p. 1-o(1) • Proof: There are at most O(qt/2) cycles of length t

Two-Sided Error • Two distributions: • Gfar: random graph with edge probability 3e • Gbip: first random partition, then each edge crossing partition exists with prob 6e • Distributions indistinguishable by • Non-adaptive algorithms of query complexity o(1/e2) • Adaptive algorithms of query complexity o(1/e1.5) Both tight for these distributions

Generality/Lessons • Possible lesson: try random graph as a possible distribution of ‘hard’ instances far from having the properties • Not good for “Triangle freeness” property whose complexity is possibly most interesting open question in the adjacency matrix model.

Triangle-free Graphs • Want to distinguish triangle-free graphs from graphs where need to remove en2 edges to break all triangles • Solvable in time super-exponential in 1/e • Polynomial in 1/e is impossible [Alon] • 2poly(1/e) possible? • Simplest special case of more general (and important) question

Sublinear Time Approximation • Max CUT and other graph problems can be approximated within (1+e) in graphs with at least an2 edges in time 2poly(1/ea) [GGR] • Max 3SAT can be approximated within (1+e) in instances with at least an3 clauses in time 2poly(1/ea) and similar results for other satisfiability problems [AFKK] • Lower bounds?

Property Testing and Approximation in Adjacency List Representation

Bipartiteness [GR] Testing bipartiteness • Repeat polylog n times: • Start at random point, and pick sqrt(n) random walks of length polylog n, if two of them combine to form an odd cycle reject, otherwise accept • Analysis: • in a graph where you need to remove constant fraction of edges to make it bipartite, algorithm finds odd cycle

Matching Lower Bound [GR] • Define two distributions of graphs: • Gfar: a random hamiltonian circuit, plus a random matching(whp 1/100-far from bipartite) • Gbip: a random hamiltonian circuit, plus a random matching conditioned on making the graph bipartite • Gfar and Gbip are indistinguishable to algorithms of query complexity o(sqrt(n)).

Approximation Algorithms • Minimum spanning tree • given a connected weighted graph of degree d with weights in range {1,…,w}, can approximate MST weight within (1+e) in time about O(dw/e2)[Chazelle, Rubinfeld, T] • Max SAT • Given a CNF where every variable occurs at most d times, can approximate Max SAT optimum within .618, presumably also 2/3, in O(d) time[Hopefully will get 3/4-d]

Testing 3-Colorability • NP-hard in adjacency list representation • Only for small enough e • Can find 3-coloring good for 80% of the edges in a 3-colorable graph using SDP • NP-hard to find 3-coloring good for 98% (?) fraction of edges • Gives non-tight, and conditional lower bound for query complexity

Other Problems • Query complexity of following problems is ‘equivalent’ to query complexity of testing 3col • Testing satisfiability of 3SAT instance • Every variable occurs in O(1) clauses, “adjacency list” representation • Approximating max cut, vertex cover, independent set, . . ., in bounded-degree graphs • Approximating Max SAT, Max 2SAT, . . . • Lower bound of sqrt(n) for all problems • Reduction from bipartiteness

Tight Lower Bound [BOT] • For one-sided error algorithms: • W(n) query complexity to distinguish 3-colorable graphs from graphs that are (1/3 – d)-far • Lower bound applies to testing problems that are solvable in polynomial time • For two-sided error algorithms: • For some e, W(n) query complexity to distinguish 3-colorable graphs from graphs that are e-far.

Using Reductions. . . • Unconditionally, algorithms running in time o(n) cannot: • Approximate Max 3SAT better than 7/8 • Approximate Max Cut in bounded-degree graphs better than 16/17 • . . . • Hastad’97 proved above problems are NP-hard

The 3-Coloring Lower Bound • Consider first one-sided error algorithms • It’s enough to find a graph G that is (1/3 – d)-far from 3-colorable, but every subgraph of size < an is 3-colorable • (for every d there is an asuch that . . .) • Then an algorithm of query complexity < an either accepts G (which is wrong) or rejects some 3-colorable graph (which means the algorithm has not one-sided error)

The Graph • Pick a graph of degree O(1/d2) at random (pick so many random matchings) • Then it is (1/3 – d)-far whp • But, for some a, whp, every subgraph induced by k < an vertices contains <1.5k edges • In a minimal non-3-colorable graph, every vertex has degree at least 3 • Every subgraph induced by < an vertices is 3-colorable [Erdos]

Derandomization • For constants d, e, a, and for every suff large n, we can explicitly construct a graph • on n vertices, • max degree d, • e-far from 3-colorable, • such that every subset of an vertices induces a 3-colorable subgraph.

Two-Sided Error Algorithms • Need to define two distributions of graphs Gcol and Gfar such that • Graphs in Gcol are (almost) always 3-colorable • Graphs in Gfar are (almost) always far from 3-colorable • To an algorithm of bounded query complexity, Gcol and Gfar look (almost) the same

Main Step • Define two distributions Dsat and Dfar of instances of E3LIN-2(systems over GF(2) with 3 variables per equation) • Systems in Dsat are always satisfiable • Systems in Dfar are (almost) always (1/2-d)-far from satisfiable • To an algorithm of bounded query complexity, Dsat and Dfar look the same • We get Gcol and Gfar using reduction fromapproximate E3LIN-2 to approximate 3-coloring

E3LIN-2 X1 + X3 + X10 = 0 mod 2 X2 + X3 + X4 = 1 mod 2 X1 + X2 + X9 = 0 mod 2 . . .

Main Building Block • We show that for every c there is a such that there exists a left-hand side with • n variables, cn equations, 3 variables per equations, every variable occurs in 3c equations • every an equations are linearly independent • Pick the left-hand side at random • repeat 3c times: pick at random a set of n/3 disjoint triples of variables • Explicit construction? • Need strong unique-neighbor expanders

Distributions • The left-hand side is always as before • In Dsat, we pick a random assignment to the variables, and set right-hand side consistently • always satisfiable • In Dfar, we pick the right-hand side uniformly at random • With high probability, (1/2 – O(1/sqrt c))-far

Indistinguishability • Two distributions differ only in right-hand side • In Dfar uniformly distributed • In Dsat, an-wise independent • Linear independence implies statistical independence • Look the same to algorithm that sees less than an equations

Conclusion of the Argument • No algorithm of “query complexity” o(n) can distinguish satisfiable instances of E3LIN-2 from instances that are (1/2-d)-far from satisfiable • For some e, no algorithm of query complexity o(n) can distinguish 3-colorable graphs from graphs that e–far from 3-col. • No algorithm of query complexity o(n) can approximate Max 3SAT better than 7/8 . . .

Generality/Lessons • Reductions are useful and extend results to several problems • In adjacency matrix (dense graph) setting, several and general algorithms. Few and ad-hoc lower bounds • In adjacency list (sparse graph) setting, vice versa.

Open Questions • Show that distinguishing 3-colorable graphs from (1/3-d)-far graphs requires query complexity W(n) • we can only prove it for one-sidederror • Show that approximating Max SAT better than ¾ and Max CUT bettter than½ requires query complexity W(n) • we only know W(sqrt(n)) [implicit in GR] • would “explain” why we need SDP

Lower Bounds for Property Testing