310 likes | 444 Views
Some Techniques in Property Testing. Dana Ron Tel Aviv University. ?. ?. ?. ?. ?. Task should be performed by inspecting the object (in as few places as possible). Property Testing (Informal Definition). For a fixed property P and any object O ,
E N D
Some Techniques in Property Testing Dana Ron Tel Aviv University
? ? ? ? ? Task should be performed by inspecting the object (in as few places as possible). Property Testing (Informal Definition) For a fixed property Pand any object O, determine whether O has property P, or whether O is farfrom having propertyP(i.e., far from any other object having P ).
Examples • The object can be a function and the property can be linearity. • The object can be a string and the property can be membership in a fixed regular language L. • The object can be a graph and the property can be 3-colorabilty.
Context Property testing can be viewed as: • Arelaxation of exactlydeciding whether the object hasthe property or does not have the property. • A relaxation of learning the object (with membership queries and under the uniform distribution). In either case want testing algorithm to be significantlymore efficient than decision/learning algorithm.
When can Property Testing be Useful? • Object is to HUGE and even scanning it is infeasible so must make approximatedecision. • Object is just large but exact decision is NP-hard. • Have poly-time exact algorithm, but approximate answer suffices so prefersub-linear approximate algorithm. • Use Testing as preliminary step to exact decision. Namely, use testing to very quicklyrule out objects that are far from having the property.
Property Testing - Background • Initially defined by Rubinfeld and Sudan in the context of Program Testing (of algebraic functions). • With Goldreich and Goldwasser initiated study of testing properties of combinatorial objects, and in particular graphs. • Growing body of work deals with properties offunctions, graphs, strings, sets of points ... Many algorithms with complexity that is sub-linear in (or even independent of) size of object.
Types of objects and properties. Issues and Categories • functions (algebraic and non-algebraic properties); • graphs; • strings; • matrices; • geometric objects; • sets of points; • Algorithmic techniques. Mostly: global sampling + local exploration • Analysis techniques. • self-correcting • enforce & test • regularity lemma • testing by implicit learning • testing based on invariance
Linearity Testing [Blum Luby Rubinfeld] Def1:A function f : Fn F is called linear (multi-linear) if there exist coefficients a1,…,an F s.t. f(x1,…,xn) = aixi . Def2: A function f is said to be-far from linear if for every linear function g, dist(f,g)>,wheredist(f,g)=Pr[f(x) g(x)] (x selected uniformly in Fn). Def3:Linearity Testing Problem:Algorithm can queryfunction on any x in Fnto obtain f(x) - if f is linear then alg should accept;-if f is -far fromlinear then alg should reject w.h.p.; Fact: A function f : Fn F is linear i.f.ffor everyx,y Fnit holds that f(x)+f(y)=f(x+y) .
Linearity Testing Cont’ Linearity Testing algorithm 1) Uniformly and independently select (1/) pairs of elements x,y Fn. 2) For every pair x,y selected, verify that f(x)+f(y) = f(x+y). 3)If for any of the pairs selected linearity is violated (i.e., f(x)+f(y) f(x+y)), then REJECT, otherwise ACCEPT. Observe: Iff islinear then test accepts w.p. 1. Lemma: Iff is-far from linear then with probability at least 2/3 the test rejects it. Lemma: Iff isacceptedwith probability greater than 1/3 , then f is -close to linear.
Lemma: Iff isacceptedwith probability greater than 1/3 , then f is -close to linear. Linearity Testing Cont’ Suppose f is accepted w.p > 1/3 small (< /2) fraction of violating pairs (f(x)+f(y)f(x+y)) Define self-corrected version of f, denote g: For each x,y let Vy(x) =f(x+y)-f(y) (the vote of y on x)g(x) = Plurality(Vy(x)) Can show that (conditioned on < /2 fraction of violating pairs)(1) g islinear. (2) dist(f,g) Main Technical Lemma (informal):if few violating pairsthen x we have thatfor almost ally, Vy(x)=g(x)
Def:A function f : Fn F is a (total) degree d polynomial if there exist coefficients {av}where v=v1…vn, vi ≥ 0, vi d s.t. Testing Polynomials (over finite fields) Different algorithms were designed to deal with different cases (e.g. d=1 [BLR],|F|>d [Rubinfeld, Sudan],F=GF(2), d>1 [Alon,Kaufman,Krivelevich,Litsyn,R]), and are analyzed using Self-correction approach. Unifying algorithm [Kaufman,R]works by restricting function to low-dimensional affine subspaces, and checking that restriction is low-deg poly (for prime fields, dimension is (d+1)/(|F|-1)). Self correction (definition of “good” function g) works by correcting value on point based on “vote” of all subspaces it belongs to.
Notes on Self-Correcting Approach Note1: definition of self-correction function g allows to actually correctf : for everyx can determine g(x) w.h.p by few queries to f. Note2: Found useful when testingproperties that correspond to subclasses of above. For example, singleton functions (f(x) = xi) are subclass of linear functions. Test for singletons[Parnas, R, Samorodnitsky]first runs linearity test. If passes, then runs additional check on self-corrected version of function. Note3: Found useful for distribution-free testing [Halevi, Kushilevitz]:General transformation for testers under uniform dist. to dist.-free when can self-correct.
V1 V2 Testing Bipartiteness Def1: Graph G=(V,E) is bipartite i.f.f. can partition vertices into two subsets V1 and V2 s.t. there are no edges between vertices that are both in V1 or both in V2. Def2: Graph G=(V,E) is -far from bipartite if every partition (V1,V2) hasmore than|E| violating edges. Recall that can decide whether graph is bipartite in time O(|V|+|E|) by Breadth First Search (BFS). However, we want very fastapproximate decision. Here consider dense case: |E| = (|V|2). Graph is represented by adjacency matrix, and alg can probe matrix.
Testing Bipartiteness in Dense Graphs[Goldreich Goldwasser R] • Uniformly and independently select (log(1/e)/e2) vertices in graph. • If subgraph induced by selected vertices is bipartite, then accept, otherwise, reject. Query complexity and running time of algorithm:O(log2(1/e)/e4). Slight variant yieldsO(log2(1/e)/e3)and [Alon, Krivelevich] reduced toO(log2(1/e)/e2). Correctness:If graph is bipartite then alwaysaccepted.Need to prove that if e-far from bipartitethen rejected w.h.p.
U1 U2 High-Level idea of Analysis (When Graph is -far from bipartite) View sample as two parts: Uand S. Suppose everygraph vertex has some neighbor in U. (In fact, w.h.p. over U holds for almost allsufficiently high degree vertices.) Idea: each partition (U1,U2) of U“enforces”a partition of all vertices. Since G is -far from bipartite, partition must have many violations. S Will show w.h.p. in sample S (“test” sample) Since holds for every partition (U1,U2) of U, w.h.p. do not have any bipartite partition of Uand S together (induced subgraph not bipartite).
Notes on Enforce&Test Note1: Bipartite Testing algorithm and enfroce&test analysis can be generalized to testingk-colorability [GGR]. Note2: Other properties whose analysis falls under enforce&test approach: r-Clique, r-Cut, and other graph partition properties [GGR]; Hypergraph coloring [Czumaj, Sohler];Tree metric properties [Parnas R]; Clustering [Alon, Dar, Parnas, R] and more. Note3: For k-colorability, Clustering and other properties, can use output of tester to actually constructapproximately good colorings/clusterings. E.g., for Bipartiteness, if graph is bipartite can determine partition that is approximately good, in constant time per vertex (has certain similarity to self correction).
Testing for Concise Representations[Diakonikolis, Lee, Matulef, Onak, Rubinfeld, Servedio,Wan] Results (partial) for n-variable Boolean functions: Class of functions Num of queries For all classes, poly(1/) and no dependence on n
Testing for Concise Representations (cont) • Observation: many classes of functions that have concise representations (e.g., s-term DNF) can be approximated by small juntas in the class. • Example: every s-term DNF function f is -close to an s-term DNF that depends on slog(s/) variables. • Rough idea of algorithm(s): • Find collection of subsets of variables s.t. each contains a single variable on which function depends (non-negligibly) (variant of junta testing [Fischer,Kindler,R,Safra,Samrodnitsky])– if num of subsets greater than some k, rejects. • Based on subsets create sample of labeled examples over {0,1}k (does not identify relevant variables). • Check whether exists function of appropriate form over k variables that is consistent with sample.
Testing for Concise Representations (cont) • Rough idea of algorithm(s): • Finds collection of subsets of variables s.t. each contains a single variable on which function depends (non-negligibly) – if num of subsets greater than some k, rejects. • Based on subsets creates sample of labeled examples over {0,1}k (does not identify relevant variables). • Checks whether exists function of appropriate form over k variables that is consistent with sample. D - - D - D - - - D 1 - - 0 - 0 - - - 1 1 0 - - 1 - 0 - - - 1 0 1 - - 1 - 1 - - - 0 0 x1 x4is consistent with labeled sample accept.
Notes on Testing by Implicit Learning Note1: technique gives rise to many positive results (also extends to non-Boolean functions) Note2: well known that (proper) learning implies testing, but with roughly the same complexity. By using implicit learning save in complexity Note3: running time in general is exponential in query complexity. New result for sparse polynomials over GF(2) [Diakonikolis, Lee, Matulef, Servedio,Wan]gives time- efficient algorithm.
b Extensions of PT:Tolerant Testing and Distance Approximation[Parnas, Rubinfeld,R] Tolerant Testing: Given parameters 0 ≤ 1 < 2 distinguish between being 1–close to property P and 2–far from P(“standard” testing: 1 = 0) Example: Clustering. Standard testing requires to accept only perfect clusterings (k clusters, quality (e.g., diameter) q). Tolerant testing requires to accept good clusterings (with few outliers.) Distance approximation: estimate distance of object from having property P. Results: clustering, monotonicity, local testing of codes, graph properties (dense and sparse models), and more.
What Hasn’t been Covered? Lot’s of things! Important Analysis Tool for Graph Properties: Szemerdi’s regularity lemma (variants of). Used for analyzing graph properties (includes partition and forbidden subgraph properties) [Alon,Fischer,Krivelevich,Szegedy]. Many otherresults used it since. Recently used to characterizeall properties testable with no dependence on size of graph[Alon, Fischer, Newman, Shapira] Important component for graph properties lower bounds (forbidden subgraphs): Arithmetic Progressions [Alon], [Alon, Shapira] (x3) Tantalizing open problem: What is complexity of testing triangle-freeness (in dense-graphs model)? UB: tower of height poly(1/).LB: (roughly) exp(1/)
Testing and the Regularity Lemma[Alon,Fischer,Krivelevich, Szegedy],[Alon,Shapira]*,…, [Alon, Fischer, Newman, Shapira] The Basis: For every , the vertices of every (sufficiently large) graph can be partitioned into t=t() subsets V1,…,Vt of equal size s.t. edge distribution between subsets Vi , Vj is roughly like in random graph with edge prob. pi,j = |E(Vi,Vj)|/|Vi||Vj| . Results: of algorithm
Last Example: Monotonicity Testing Def:A function f : [n] R is monotone if for every i,j in [n],i< j we have f(i) ≤ f(j).It is -farfrom montone if must modify more than -fraction of values so that become monotone. Observation: “Natural algorithm” (take uniform sample and check whether f is monotone on sample) does not work unless sample size =(n1/2),
10 13 12 15 14 17 16 19 18 21 20 23 22 25 24 27 26 28 31 30 33 32 11 Monotonicity Testing Cont’ • An alternative testing algorithm: • Repeat the following O(1/e) times: • Pick an entry uniformly at random. Letx be the value in that entry. • Perform a binary search forx • Ifx is found, output accept, otherwise, output reject. 29 X= 28 Main Claim: entries for which search succeeds define a monotonically non-decreasing sequence. Hence, If e–far then must have more thane–fraction entries on which search fails, causing testing to reject w.h.p.
Tolerant Testing of Clustering [Parnas,R,Rubinfeld] Tolerant Testing: Reject when -far but accept when ’-close TolerantTesting Algorithm (input: k,e, ’, ) (1) Takesample of m=m(k,e, ’,) points from X. (2) If sample is (’ + (e - ’)/2)-close to(k,b)-clusterable then accept, o.w. reject Can analyze using a generalization of a framework by Czumaj & Sohler for (standard) testing that captures aspects of “enforce&test” approach. Sample has quadratic dependence on 1/(e - ’), and samedependence on other parameters as (standard) testing algorithm.
Directions for Further Research “Biggest” open problem: Can we characterizewhat properties are efficiently testable? (e.g., find a measure analogous to VC - dimension.) Find Families of properties that are efficiently testable. (Similarly to results for partition properties of graphs, graph properties and regular languages result.) Extend scope of property testing.