410 likes | 614 Views
A Tutorial on Property Testing. Dana Ron Tel Aviv University. ?. ?. ?. ?. ?. Task should be performed by querying the object (in as few places as possible). Property Testing (Informal Definition). For a fixed property P and any object O , determine whether O has property P ,
E N D
A Tutorial on Property Testing Dana Ron Tel Aviv University
? ? ? ? ? Task should be performed by querying the object (in as few places as possible). Property Testing (Informal Definition) For a fixed property Pand any object O, determine whether O has property P, or whether O is farfrom having propertyP(i.e., far from any other object having P ).
Examples • The object can be a graph (represented by its adjacency matrix), and the property can be 3-colorabilty. • The object can be a string and the property can be membership in a given regular language L. • The object can be a function and the property can be linearity.
Context Property testing can be viewed as: • A relaxation of exactlydeciding whether the object hasthe property. • A relaxation of learning the object. In either case want testing algorithm to be significantlymore efficient than decision/learning algorithm.
When can Property Testing be Useful? • Object is to too large to even fully scan, so must make approximate decision. • Object is not too large but (1) Exact decision is NP-hard (e.g. coloring)(2) Prefer sub-linear approximate algorithm to polynomial exact algorithm. • Use Testing as preliminary step to exact decision or learning. In first case can quickly rule out object far from property. In second case can aid in efficiently selecting good hypothesis class.
Property Testing - Background • Initially defined by Rubinfeld and Sudan in the context of Program Testing (of algebraic functions). • Goldreich Goldwasser and Ron initiated study of testing properties of graphs. • Growing body of work deals with properties offunctions, graphs, strings, sets of points ... Many algorithms with complexity that is sub-linear in (or even independent of) size of object.
Talk Organization Will discuss four topics: • Testing Algebraic Properties of Functions: Linearity Testing[BLR] • Testing “Basic” (non-algebraic) Properties of Functions: Singletons, Monomials, small DNF[PRS] • Testing Graph Properties: Testing Bipartiteness[GGR] • Testing Properties of strings: Testing Membership in Regular Languages[AKNS]
Testing Algebraic Properties of Functions: Linearity Testing[BLR]
Linearity Testing Def1:Let F be a finite field. A function f : Fm F is called linear (multi-linear) if there exists constants a1,…,am F s.t. for every x=x1,…,xm Fmit holds that f(x) = aixi . Def2: A function f is said to be-far from linear if for every linear function g, dist(f,g)>,wheredist(f,g)=Pr[f(x) g(x)] (x selected uniformly in Fm). Fact: A function f : Fm F is linear i.f.f for every x,y Fmit holds that f(x)+f(y)=f(x+y) .
Linearity Testing Cont’ Linearity Test (Input:F,m,) 1) Uniformly and independently select (1/) pairs of elements x,y Fm. 2) For every pair x,y selected, verify that f(x)+f(y) = f(x+y). 3)If for any of the pairs selected linearity is violated (i.e., f(x)+f(y) f(x+y)), then REJECT, otherwise ACCEPT. Observe: Iff islinear then tests accepts w.p. 1. Theorem: Iff is-far from linear then with probability at least 2/3 the test rejects it.
Linearity Testing Cont’ Proof (of special case): Let (f) denote distance of f to closest linear function g. Assume 1/2 - (f) is constant.Let G={x: f(x)=g(x)} (so that Pr[xG]= (f)>). Say that x and y are a violating pair if f(x)+f(y) f(x+y). Observation: for any x, y, if among the 3 elements, x, y, x+y we have 2 in G and 1 not in G, then x,y are a violating pair. Consider one of the 3 (disjoint) events. Can show:Pr[xG , yG , (x+y) G ] (f) (1 - 2 (f) ). Since events are disjoint, prob of violating pair is at least 3(f) (1 - 2 (f) ) = 6 (f) (1/2- (f) ) = (). Since test takes(1/) pairs x,y, will reject w.h.p.
Linearity Testing Cont’ How do we deal with the general case (where (f)not necessarily bounded away from 1/2)? In order to prove that if (f)> then reject w.p. 2/3 , prove contrapositive: if accept w.p > 1/3 (i.e., small fraction of violating pairs) then f is -close to linear. That is, exists linear g s.t. dist(f,g) . Specifically, define g as follows:g(x) = 1 if Pry[f(x+y)-f(y)=1] 1/2 g(x) = 0 if Pry[f(x+y)-f(y)=0]> 1/2 Can prove that if fraction of violating pairs (w.r.t. f) is sufficiently small the f is close to g and g is linear. Note: definition of g allows for Self-Correcting of f (for everyx can determine g(x) w.h.p by few queries to f).
Testing “Basic” Properties of Functions: Singletons, Monomials, small DNF[PRS]
Testing “Basic” Properties of Functions: This work considers “The most basic” function classes: • Singletons: • Monomials: • DNF:
Testing “Basic” Properties of Functions Cont’ • Can test whether f is a singleton using queries. • Can test whether f is a monomial using queries. • Can test whether f is a monotoneDNF with at most tterms using queries. Common theme:no dependence in query complexity on size of input, n, and polynomial dependence on distance parameter, e.
F h f e • Can learn singletons and monomials under uniform distribution using queries [BEHW]. • Can properly learn monotone DNF with t terms and r literals using queries [A+BJT]. Learning Boolean Formulae Basic observation: (proper) learning implies testing. F f h e Main difference w.r.t testing results: no dependence on nand different algorithmic approach.
Singletons satisfy: (1)(2) Testing (Monotone) Singletons Natural test: check, by sampling, that conditions hold (approximately). Can analyze natural test for case that distance between function and class of singletons is not too big (bounded from 1/2).
Observation: Singletons are a special case of parity functions (i.e., functions of the form .) Claim:Let . If then Modified algorithm: (1) Test whether f is a parity function (with dist. par. e) using algorithm of [BLR] . (2) Uniformly select constant number of pairs x,y and check whether any is a violating pair (i.e.: ). Testing Singletons II - Parity Testing
Testing Singletons III - Self Correcting This “almost works”: Iffis singleton - always accepted. If f is e-far from parity - rejected w.h.p. But if f is e-close to parity function g, then cannot simply apply claim to argue that many violating pairs w.r.t. f. If we could only test violations w.r.t.ginstead of f ... Use Self-Corrector of [BLR] to “fix” f into parity function(g), and then test violations on self-corrected version.
Testing Singletons IIII - The Algorithm Final Algorithm for Testing Singletons:(1) Test whether f is a parity function with dist. par. e using algorithm of [BLR] . (2) Uniformly select constant number of pairs x,y. Verify that Self-Cor(f,x)Self-Cor(f,y)= Self-Cor(f,xy). (3) Verify that Self-Cor( ) = 1 .
Testing Monomials and Monotone DNF Monomial testing algorithm has similar structure to Singleton testing algorithm. (Here too suffice to find test for monotone monomials.) The first stage of linearity testing is replaced by Affinity Testing: if f is a monomial then F1={x: f(x)=1} is an affine subspace. [Fact: H is affine subspace i.f.f x,y,zH, xyz H]. Affinity test is similar to parity test: select x,yF1, z{0,1}n, verify that f(xyz)=f(x)f(y)f(z). The second stage is as in singleton test (check for violating pairs). Here affinity adds structure that helps analyze second stage. Testing monotone DNF: use monomial test as sub-routine (a monotone DNF function is a disjunction of monotone monomials).
v u 1 Testing Graph Properties Assume graphs are represented by their adjacency matrix. In this model, testing algorithm can perform queries: “is there an edge between u and v”. Distance between graphs: fraction of entries in adjacency matrix on which they differ.This model most appropriate for testing dense graphs.
Results for Testing Graph Properties In Adjacency-Matrix model • Can test: Bipartiteness, k-colorability, r-Clique, r-Cut and a more general family of partition problems, with sample complexity poly(1/e) and running time exp(poly(1/e)) both independent of size of graph [GGR]. • Can test all properties that can be formulated by first order expression about graphs with sample and time complexity independent of graph size (but at “steep” cost as function of 1/e )[AFKS]. • In directedgraphs can test acyclicity with sample and time complexity poly(1/e) [BR] (special case treated in [EKKRV]). In Incidence-Lists model Connectivity, k-edge-connectivity: complexity poly(1/e) [GR1], Bipartiteness: poly(1/e)|V|1/2[GR2], Diameter: poly(1/e) [PR].
V1 V2 Testing Bipartiteness Def: Graph G=(V,E) is bipartite i.f.f. can partition vertices into two subsets V1 and V2 s.t. there are no edges between vertices that are both in V1 or both in V2. Recall that can decide whether graph is bipartite in time O(|V|+|E|) by Breadth First Search (BFS). However, we want very fastapproximate decision. Furthermore, can extend algorithm and analysis to testing k-colorability (which is NP-Hard).
Testing Bipartiteness Cont’ • Bipartite Testing Algorithm • Uniformly and independently select m=(log(1/e)/e2) vertices in graph. • For every pair of vertices selected query whether there is an edge between the two, obtaining induced sub-graph. • Perform a BFS to determine whether induced subgraph is bipartite. If it is output accept, o.w. output reject. G Query complexity and running time of algorithm: O(log2(1/e)/e4) . Slight variant of alg yields O(log2(1/e)/e3) and [AK] have reduced to O(log2(1/e)/e2) . Correctness:If graph is bipartite then clearly always accepted.From this point on assume graph is e-far from bipartite. Will show that rejected w.p. at least 2/3.
X1 X2 v u X1 X2 U1 U2 U1 U2 S Analysis of Bipartiteness Testing Alg Def: Let X be a subset of points, and (X1,X2) a partition of X. Say that an edge (u,v) is violating w.r.t. (X1,X2) if either both u,v in X1 or both in X2. If there are no violating edges w.r.t. (X1,X2) then say it is a bipartite partition. View sample as consisting of two parts: Uand S. Show that w.h.p., for every partition (U1,U2) of Uthere is no partition (S1,S2) of S, s.t. (U1S1,U2S2) is bipartite. S In other words, the sub-graph induces by sample US is not bipartite.
U v U V Influential Non-influential Uncovered influential Analysis of Bipartiteness Testing Alg Cont’ Def1: A vertex v is influential if has degree at least (e /4)|V|. Def2: A vertex v is covered by subset U if has neighbor in U. Lem: W.h.p. Ucovers all influential vertices but (e /4)|V|.
U Non-influential C R Uncovered influential Analysis of Bipartiteness Testing Alg Cont’ Let C be vertices covered by U and let R be remaining vertices. Observe: Since R contains at most all non-influential vertices, and at most (e /4)|V| influential ones, total num of edges incident to R is at most (e /2)|V|2. Recall, graph G is e-farfrom bipartite: every partition (V1,V2) of V has > e |V|2violating edges. Together, above two imply that every partition of UC has >(e /2)|V|2 violating edges.
U1 U2 w C1 C2 v Analysis of Bipartiteness Testing Alg Cont’ Consider fixed partition (U1,U2) of U , and let (C1,C2) be partition of C where neighbors of vertices in U1 are put in C2 and neighbors of vertices in U2 are put in C1. Since (U1C1,U2C2) contains >(e /2)|V|2 violating edges, this many pairs of vertices (v,w) in C1 (C2) have violating edge between them. If get such pair (v,w) in sample S, then for every partition (S1,S2), partition (U1S1,U2S2) contains some violating edge. Since many such pairs, the sample S contains such a pair w.h.p. By union bound on number of partitions (U1,U2)(at most 2|U|=exp(log(1/e)/e)) S contains such a pair for every (U1,U2).
Testing Other Graph (Partition) Properties • Each property (k-colorability, r-Clique, r-Cut ) has its own “particularities” but in all cases: • “Natural algorithm” (take small uniform sub-sample and check induced subgraph for property) works. • Analysis works by breaking sample into two parts: the first part, U “forces” constraints on possible partitions of all vertices. Second part, S, “tests” whether constraints are satisfied. More general results of [AFKS] (combination of partition and forbidden subgraph properties ( properties)) also analyze natural algorithm. Analysis builds on Szemerdi’s regularity lemma.
Testing Properties of Strings: Membership in Regular Languages [AKNS]
Testing Membership in Regular Languages For fixedregular languageL {0,1}*, testing algorithm should accept w.h.p. every word wL, and should reject w.h.p. every word w that differs on more than e nbits (n=|w|) from every w’L (|w’|=n). Algorithm can query any bit wi of w. Let M=(Q,F,q0,d) be the (minimum) DFA that accepts L. Let G(M) denote directed graph induced by M (that is, there is a directed edge for every transition). Def: Let u=wi…wj be sub-word of w that starts at position i. Say that u is feasiblew.r.t.Mstarting fromi if there exists a state q s.t. q can be reached in G(M) from q0 in exactly i-1 steps, and there is a path of length (n-(|u|+i-1)) in G(M) from q’= d(q,u) to an accepting state qf. q0 q q’ qf i-1 steps u n-(|u|+i-1) steps
D C q’ q Testing Regular Language Cont’ • Consider special case: • Unique accepting state qf ; • Q can be partitioned into two parts: C and D: - q0,qf C ; - subgraph G(C) strongly connected;-no edgesfrom D to C. q0 qf - The GCD of cycle-lengths in G(C) is 1 There exists a constant r (=O(|Q|2) s.t. q,q’ C , m r , exists path of lengthm from q to q’.
Testing Regular Language Cont’ • The Algorithm (simplified version): • Uniformly and independently select (r/e) indices1i n . • For each i selected, check that the substring wi … wi+r/eis feasible. • If any substring is infeasible then reject, otherwise accept. Number of queries: O(r2/e2)=poly(|Q|)/ e2and running time poly(|Q|)/ e2 (can improve toalmost linear dependence on 1/e). Correctness: IfwL, then always accept.If w is e-far from L , would like to show that w contains many (short) infeasible substrings (causing rejection w.h.p).
qj’ D C uj qj wk q’j+1 uj+1 qj+1 Testing Regular Language Cont’ Prove contrapositive statement:If number of (short) infeasible substrings in w is small then w is close to w*L Proof idea: partition w (except first and last r symbols) into disjoint maximal feasible substrings u1, … ,uh : each uj is feasible, but addition of next symbol wk makes it infeasible. By slightly modifying each uj , can “glue” the modified substrings together into one string w* that “does not leave C”, and reachesqf. If h is small (as assumed), the w* close to w.
Testing Regular Language Cont’ General case works by reducing to special case we discussed. In particular need to decomposeG(M) into its strongly connected components, and consider how a word “moves between them”. This work has been extended by Newman to testing Branching Programs of bounded width, and by Kupferman and XX to testing Tree Automata.
Directions for Further Research “Biggest” open problem: Can we characterize what properties are efficiently testable? (e.g., find a measure analogous to VC - dimension.) Find Families of properties that are efficiently testable. Exist some such results for testing graph properties (e.g. partition problems) and we have the regular languages result. Extend scope of property testing.
Testing Properties of Collections of Points: Testing of Clustering
Property Testing - Background • Properties of functions: • Initially defined by Rubinfeld and Sudan in the context of Program Testing. Tested algebraic propertiesof functions: low-degree polynomials. • Other work on testing algebraic properties: [BLR,R,EKKRV...]. • Non-algebraic properties: Monotonicity[GGLRS,DGLRSS,B,FN]. • Properties of other objects: • Main focus: Graph properties:[GGR,GR,AK,AFKS,BR,PR,CS...] • Growing body of work deals with properties ofstrings [AKNS,N,PRR], sets of points [PR], geometric objects[CSZ], distributions [BFRW], and more. All algorithms have complexity that is sub-linear in (or even independent of) size of object.