230 likes | 400 Views
A threshold of ln(n) for approximating set cover. By Uriel Feige Lecturer: Ariel Procaccia. Set Cover. We are given an (unweighted!) collection F of subsets of U={1,…,n}. We must find as few as possible subsets from F such that their union covers U.
E N D
A threshold of ln(n) for approximating set cover By Uriel Feige Lecturer: Ariel Procaccia
Set Cover • We are given an (unweighted!) collection F of subsets of U={1,…,n}. • We must find as few as possible subsets from F such that their union covers U. • Hypergraph vertex cover (lecture of 10/3) is a special case of set cover.
Approximation algorithms • A greedy algorithm: • Recall algorithms 1… • At each stage, add to the cover the subset which maximizes the number of new elements. • This algorithm gives an approximation of ln(n)-lnln(n)+O(1). • An approximation of ln(n) can also be achieved using a linear programming relaxation.
A bad example for the greedy algorithm U={(x,y): 1 ≤ x,y ≤ n} S1 = {(x,y): x ≤ n/2} S2 = {(x,y): x ≥ n/2} T1= {(x,y): 1 ≤ y ≤ n/2} T2= {(x,y): n/2 ≤ y ≤ 3n/4} … 32 16 8 8 Ratio: logn/2
Overview (1) • We wish to prove Theorem 8: If a poly time algorithm can approximate set cover within (1-ε)ln(n), then NP is a subset of DTIME(nO(loglog(n))). • The strategy: a reduction from a k-prover proof system for an NP-complete problem. Familiar with IP?
Overview (2) • Parts of the lecture: • MAX 3SAT-5. • The k-prover proof system. • Partition systems. • The reduction to set cover. • Thundering applause. • May the force be with us. Similar to the lecture of 17/3.
MAX 3SAT-5 • Input: A CNF formula with n variables and 5n/3 clauses, in which every clause contains exactly three literals, every variable appears in exactly 5 clauses, and a variable does not appear in a clause more than once. • Theorem 1: For some >0, it is NP-hard to distinguish between 3CNF-5 formulas where OPT=1 or OPT≤(1-). Reduction from MAX-3SAT-B
Two prover proof system for 3SAT-5 • Presented as to provide intuition and help in the analysis of the k-prover system. • One round, two prover system for 3SAT-5. • Protocol: • Vselects an index of a clause, sends it to the first prover, selects a random var in the clause, and sends it to the second prover. • First prover returns 3 bits, second returns 1 bit. • V accepts if the following conditions hold: • Clause check: the assignment sent by the first prover satisfies the clause. • Consistency check: the assignment sent by the second prover is identical to the assignment for the same variable sent by the first prover. Assgn. To clause and var
Proposition 2: If OPT=1-ε, then under the optimal strategy of the provers, V accepts with probability (1-ε/3). • Proof: • The strategy of the second prover defines an assignment A to the variables. • If V selects a clause that is not satisfied by A (prob. ε), the first prover must set one of the variables differently from A. • The consistency check fails with prob. ≥ 1/3.■ There is a strategy with acceptance prob ε/3
Parallel repetition • We would like to lower the error. • Modified proof system: V sends to the first prover l clauses, from each chooses a var, and sends these l vars to the second prover. • Theorem 3 (Ran Raz): If a one round two prover system is repeated l times independently in parallel, then the error is 2-cl, where c>0 is a constant that depends only on the original proof system. • The error of the modified two prover system is ≤ 2-cl, for some universal constant c. C for subtle
The k-prover proof system P1: 0011 P2: 0101 P3: 1100 • Binary code with k codewords, each of length l and weight l/2, with Hamming distance at least l/3. • Each prover is associated with a codeword. • The Protocol: • The verifier selects l clauses C1,…,Cl, then selects a var from each clause: the distinguished variables x1,…,xl. • Prover Pi receives Cj for those coordinates in its codeword that are 1, and xj for the coordinates that are 0, and replies with 2l bits. • The answer of the prover induces an assignment to the distinguished variables. • Acceptance predicate: • Weak: at least one pair of provers is consistent. • Strong: every pair of provers is consistent. c1 c2 v3 v4 100,010,1,0 V
Lemma 4: If OPT=1 then the provers have a strategy that causes V to always strongly accept. If OPT≤(1-ε), then the verifier weakly accepts with probability at most k22-cl, where c>0 is a constant that depends only on ε. • Proof: • If OPT=1, the provers can base their answers on a satisfying assignment. • Assume OPT=(1- ε), and that V weakly accepts with prob. ≥ δ. Then with respect to Pi and Pj, V accepts with probability δ/k2. • There are ≥ l/6 coordinates on which Pi receives a clause, and Pj a var in this clause. Other coordinates: for free. • The provers have a strategy that succeeds with prob. ≥ δ/k2 on l/6 parallel repetitions of the original proof system. • δ/k2<2-cl. ■
Partition systems • A partition system B(m,L,k,d) has the following properties. • There exists a ground set B of m distinct points. • There is a collection of L distinct partitions p1,…,pL. • For 1≤i≤L, partition pi is a collection of k disjoint subsets of B whose union is B. • Any cover of the m points by subsets that appear in pairwise different partitions requires at least d subsets. • Lemma 6: For every c≥0 and m sufficiently large, there is a partition system B(m,L,k,d) whose parameters satisfy the following inequalities: • L ≈ (log(m))c. • K can be chosen arbitrarily as long as k < ln(m)/3ln(ln(m)). • d = (1-f(k))k∙ln(m), where f(k)→∞ as k→∞. • Proof: a randomized construction usually works. k = 4 Looks familiar? L=4
|U|? The reduction m points, k subsets r=1 L=2l • R=(5n)l : num of r. • r↔Br(m,L,k,d): m=nΘ(l), L=2l, d=(1-f(k))k∙ln(m). • Partition↔dist. vars, subset↔prover. B(r,j,i) = i’th subset, partition j, partition system r. • Subsets are S(q,a,i): all r s.t. (q,i)@r, extract from a an assignment ar to dist. vars. S(q,a,i) is the union of all subsets B(r,ar,i), for all r s.t. (q,i)@r. • Q=nl/2(5n/3)l/2 (questions to Pi). r=2 00 01 10 11 S(q,a,1) B(2,4,3) r=R
ar=01 • Lemma 5: • Completeness: OPT=1 there is a set cover of size kQ. • Soundness: OPT=(1-ε) more than (1-2f(k))kQln(m) subsets are required. • Proof of Completeness: • IfOPT=1, the provers answer consistently with the satisfying assignment. • For any r, consider S(q1,a1,r),…,S(qk,ak,r) s.t. (qi,i)@r, and ai is the appropriate answer. • Br(m,L,k,d) is covered by these k sets. • Similar for every r. The number of subsets is kQ. ■
Proof of soundness: the saga begins • Assume OPT≤1-ε, and that there exists C that covers U, such that |C|=(1-δ)kQln(m), δ=2f(k). • q to prover Pi↔ weight w(q,i)=number of answers a s.t. S(q,a,i) is in C. • Σq,iw(q,i) = |C|. • r↔w(r)= Σ(q,i)@rw(q,i). • This weight = number of subsets that participate in covering Br(m,L,k,d). • Call r good if w(r)<(1-δ/2)k∙ln(m).
The good, the bad, and the random • Proposition 6: The fraction of good r is at least δ/2. • Proof: • Assume otherwise, then: • On the other hand, • Hence |C|>(1- δ)kQln(m) – contradiction. ■ r is good if: w(r)= Σ(q,i)@rw(q,i)<(1-δ/2)k∙ln(m)
Fix coin tosses for provers • Proposition 7: C covers U, |C|=(1-δ)kQln(m) for some strategy for the k provers, V weakly accepts with prob. ≥ 2δ/(k∙ln(m))2. • This proves soundness, since 2δ/(k∙ln(m))2>k22-cl, for l=Θ(loglogn). • Proof of proposition 7: • Randomized strategy: on q to Pi, select a from the set of a s.t. S(q,a,i) is in C. • For a fixed r: sets B(r,p,i) in the cover of Br(m,L,k,d) ↔ sets S(q,a,i) in C. • For a good r: C used two subsets from p in the cover of Br(m,L,k,d). • Denote B(r,p,i) and B(r,p,j) ↔ S(qi,ai,i) and S(qj,aj,j). • Denote: a is in Ar,i iff S(qi,a,i) is in C. • r is good |Ar,i|+|Ar,j|<k∙ln(m). • The prob. that the provers answer ai and aj≥ 4/(k∙ln(m))2. Answers are consistent with p V weakly accepts. • The prob. that V chooses a good r ≥ δ/2. ■
Theorem 8: If there is some ε>0 such that a polynomial time algorithm can approximate set cover within (1-ε)ln(n), then NP is a subset of DTIME(nO(loglog(n))). • Proof: • Assume there is a poly time algorithm A that approximates set cover within (1-ε)ln(m). • Reduce GAP-3SAT-5 to set cover as described, with k s.t. f(k)<ε/4, and m=(5n)2l/ε. • m,R,Q=nO(loglogn) time to perform the reduction is nO(loglogn). • ln(m)>(1-ε/2)ln(N), where N=mR. • By lemma 5, for a YES instance all points can be covered by kQ subsets, and for a NO instance all points cannot be covered by (1-2f(k))kQln(m). • The ratio is (1-2f(k))ln(m)>(1-ε)lnN. ■
Closing remarks (1) • Max k-cover: • Given U and F as before, select k subsets that such that their union has maximum cardinality. • The obvious greedy algorithm approximates max k-cover within a ratio of at least 1-1/e ≈ 0.632. • Theorem 8 can be directly used to prove: if max k-cover can be approximated in a polynomial time within a ration of (1 - 1/e + ε) for some ε>0, then NP is a subset of DTIME(nO(loglogn)).
Closing remarks (2) • Refinements: • In our hardness of approximation result for set cover, ε is a constant. We may strengthen our assumption so as to improve our result. • ZTIME=class of languages that have a probabilistic algorithm that runs in expected time t (with zero error). • If for some η>0, NP is not a subset of ZTIME(2n^η), then for some constant c’>0 there is no polynomial time algorithm that approximates set cover within ln(n) –c’(lnln(n))2.