Introduction to PCP

Introduction to PCP

Introduction In this lecture we’ll cover: • Definition of PCP • Prove some classical hardness of approximation results • Review some recent ones

Review: Decision, Optimization Problems • A decision problem is a Boolean function ƒ(X), or alternatively a languageL  {0, 1}* comprising all strings for which ƒ is TRUE: L = { X  {0, 1}* | ƒ(X) } • An optimization problem is a function ƒ(X, Y) which, given X, is to be maximized (or minimized) over all possible Y’s: maxy[ ƒ(X, Y) ] • A threshold version of max-ƒ(X, Y) is the languageLt of all strings X for which there exists Y such that ƒ(X, Y)  t [transforming an optimization problem into decision]

Review: The Class NP The classical definition of the class NP is: A language L  {0, 1}* belongs to the classNP, if there exists a Turing machine VL[referred to as a verifier] such that X  L there exists a witnessY such thatVL(X, Y)accepts, in time |X|O(1) That is, VL can verify a membership-proof of X in L in time polynomial in the length of X

Review: NP-Hardness • A language L is said to be NP-hard if an efficient (polynomial-time) procedure for L can be utilized to obtain an efficient procedure for any NP-language • This definition allows efficient reduction that use the more general, Cook reduction. An efficient algorithm, translating any NP problem to a single instance of L - thereby showing that LNP-hard - is referred to as Karp reduction

Review: Characterizing NP Thm[Cook,Levin]:For L NP there’s an algorithm that, on input X, constructs, in time |X|O(1), a set of local-constraints (Boolean functions)L,X = { j1, ..., jl }over variables y1,...,ym s.t.: • each of j1, ..., jl depends on o(1) variables • X  L there exists an assignment A: { y1, ..., ym } a { 0, 1 }satisfying all L,X [ note that m and l must be at most polynomial in |X| ]

 T T T T F  T F T F F T T T  T NP characterization   1 y1 IfX  L,all of the local tests are satisfied y2 j yi ym-1 ym l

Approximation - Some Definitions Def: g-approximation A g-approximationof a maximization (similar for minimization) function f, is an algorithm that on input X, outputs f’(X) such that: f’(X)  f(X)/g(|X|). Def: PTAS (poly-time approximation scheme) We say that a maximization function f, has a PTAS, if for every g, there is a polynomial pg and a g-approximationfor f, whose running time is pg(|X|)

Approximation - NP-hard? • We know that by using Cook/Karp reductions, we can show many decision problems to be NP-hard. • Can an approximation problem be NP-Hard? • One can easily show, that if there is g,for which there is a g-approximating for TSP, P=NP.

PCP AS,ALMSS X  L assignment A: { y1, ..., ym }  { 0, 1 }satisfies < ½ fraction of L,X Characterization of NP Thm[Cook,Levin]:For L NP there’s an algorithm that, on input X, constructs, in time |X|O(1), a set of local-constraints (Boolean functions)L,X = { j1, ..., jl }over variables y1,...,ym s.t.: • each of j1, ..., jl depends on o(1) variables • X  L there exists an assignment A: { y1, ..., ym } a { 0, 1 }satisfying all L,X

 T F  F T  T PCP NP characterization 1 y1 IfX  L,at least half of the local tests aren’t satisfied ! y2 j yi ym-1 ym l

Probabilistically Checkable Proofs • Hence, Cook-Levin theorem states that a verifier can efficiently verify membership-proofs for any NP language • PCP characterization of NP, in contrast, states that a membership-proof can be verified probabilistically • by choosing randomly one local-constraint, • accessing the small set of variables it depends on, • accept or reject accordingly • erroneously accepting a non-member only with small probability

Gap Problems • A gap-problem is a maximization (or minimization) problem ƒ(X, Y), and two thresholds t1 > t2 X must be accepted if maxY[ ƒ(X, Y) ]  t1 X must be rejected if maxY[ ƒ(X, Y) ]  t2 other X’s may be accepted or rejected (don’t care) (almost a decision problem, relates to approximation)

Reducing gap-Problems to Approximation Problems • Using an efficient approximation algorithm for ƒ(X, Y) to within a factor g,one can efficiently solve the corresponding gap problem gap-ƒ(X, Y), as long as t1 /t2 > g2 • Simply run the approximation algorithm.The outcome clearly determines which side of the gap the given input falls in.(Hence, proving a gap problem NP-hard translates to its approximation version, for appropriate factors )

gap-SAT • Def: gap-SAT[D, v, ] is as follows: • Instance: a set  = { j1, ..., jl }of Boolean-functions (local-constraints)over variables y1,...,ym of range 2V • Locality: each of j1, ..., jl depends on at mostD variables • Maximum-Satisfied-Fraction is the fraction of  satisfied by an assignment A: { y1, ..., ym } a 2vif this fraction • = 1  accept • < reject • D, v and  may be a function of l

The PCP Hierarchy Def:L  PCP[ D, V,  ]if L is efficiently reducible to gap-SAT[ D, V,  ] • Thm[AS,ALMSS] NP  PCP[ O(1), 1, ½] [ The PCP characterization theorem above ] • Thm[ RaSa ] NP  PCP[ O(1), m, 2-m ] for m  logc n for some c > 0 • Thm[ DFKRS ]NP  PCP[ O(1), m, 2-m ] for m  logc n for any c < 1 • Conjecture[BGLR]NP  PCP[ O(1), m, 2-m ] for m  log n

Optimal Characterization • One cannot expect the error-probability to be less than exponentially small in the number of bits each local-test looks at • since a random assignment would make such a fraction of the local-tests satisfied • One cannot hope for smaller than polynomially small error-probability • since it would imply less than one local-test satisfied, hence each local-test, being rather easy to compute, determines completely the outcome [ the BGLR conjecture is hence optimal in that respect]

Approximating MAX-IS is NP-hard We will reduce gap-SAT to gap –Independent-Set. Given an expression  = { j1, ..., jl }of Boolean-functions over variables y1,...,ym of range 2V, Each of j1, ..., jl depends on at mostD variables, we must determine whether all the functions can be satisfied or only a fraction less than . We will construct a graph, G , that has an independent set of size r there exists an assignment, satisfying r of the local-constraints y1,...,ym.

q (q,r)-co-partite Graph G=(QR, E) • Comprise q=|Q| cliques of size r=|R|:E {(<i,j1>, <i,j2>) | iQ, j1,j2 R}

q Thm:IS( r,  ) is NP-hard as long as r  ( 1 / )cfor some constant c Gap Independent-Set Instance: an (q,r)-co-partite graph G=(qR, E) Problem: distinguish between • Good: IS(G) = q • Bad: every set I  V s.t. |I|> q contains an edge

T T T l F T T T F T gap-SAT  gap-IS Construct a graphG that has 1 clique i  , in which 1 vertex  satisfying assignment for i   1 y1 y2 j yi ym-1 ym l

T T T l F T T T F T gap-SAT  gap-IS Two vertices are connected if the assignments they represent are inconsistent 1 y1 y2 j yi ym-1 ym l

gap-SAT  gap-IS Lemma:a(G) = k (independent set of size k) X  L (There is an assignment that satisfies k clauses) • Consider an assignment A satisfying k clauses. For each clause i consider A's restriction to ji‘s variables The corresponding k vertexes form an independent set in G • Any independent set of size k in G implies an assignment satisfying k of j1, ..., jl Hence: Gap-IS is NP hard, and IS is NP-hard to approximate!

Hardness of approximation of Max-IS Each of the following theorems gives a hardness of approximation result of Max-IS: • Thm[AS,ALMSS] NP  PCP[ O(1), 1, ½] • Thm[ RaSa ] NP  PCP[ O(1), m, 2-m ] for m  logc n for some c > 0 • Thm[ DFKRS ]NP  PCP[ O(1), m, 2-m ] for m  logc n for any c > 0 • Conjecture[BGLR]NP  PCP[ O(1), m, 2-m ] for m  log n

Hardness of approximation forMax-3SAT Assuming the PCP theorem, we will show that if PNP,Max-3Sat does not have a PTAS: Theorem: There is a constant C>0 so that computing (1+c) approximations to Max-3Sat is NP-hard

1 C1 C3 C2 Ck 1 y1 y2 j C1 C3 C2 Ck j yi ym-1 l ym C1 C3 C2 Ck l Hardness of approximation forMax-3SAT SAT formula Equivalent 3SAT formula variables Given an instance of gap-SAT,  = { j1, ..., jl }, we will transform each of the ji‘s into a 3-SAT expression i.

1 C1 C3 C2 Ck 1 y1 y2 j C1 C3 C2 Ck j yi ym-1 l ym C1 C3 C2 Ck l Hardness of approximation forMax-3SAT • Hence each function can be represented as a CNF formula i: Given an instance of gap-SAT,  = { j1, ..., jl }, there are O(n) functions ji . Each of the ji‘s depends on up to D=O(1) variables. (a conjunction of 2^D clauses, each of size at most D) Note that the number of clauses is still constant. Overall, we build a CNF formula: a conjunction of i (one for or each local test).

1 C1 C3 C2 Ck 1 y1 y2 j C1 C3 C2 Ck j yi ym-1 l ym C1 C3 C2 Ck l Hardness of approximation forMax-3SAT Now rewrite every D-clause as a group of 3-clauses to obtain a 3-CNF: Note that this is still a constant blow up in the number of clauses.

1  C1 C3 C2 Ck 1 y1 y2 j  C1 C3 C2 Ck j yi ym-1 l  ym C1 C3 C2 Ck l Hardness of approximation forMax-3SAT In case  is NOT satisfyable, some constant fraction of the j1 are not satisfied, and for each, at least one clause in i isn’t satisfied.

Hardness of approximation forMax-3SAT Conclusion: In case the original SAT formula  isn’t satisfied, a constant number of 3SAT formula i are not satisfied, and for each at least one clause isn’t satisfied. Because each i contains a constant number of clauses, altogether a constant number of clauses in the resulting 3SAT aren’t satisfied. This provides a gap, and hence 3SAT cannot be approximated to within some constant unless P=NP!

More Results Related to PCP The PCP theorem has ushered in a new era of hardness of approximation results. Here we list a few: • We showed that Max-Clique ( and equivalently Max-Independent-Set ) do not has a PTAS. It is known in addition, that to approximate it with a factor of n1- is hard unless co-RP = NP. • Chromatic Number - It is NP-Hard to approximate it within a factor of n1- unless co-RP = NP. There is a simple reduction from Max-Clique which shows that it is NP-Hard to approximate with factor n. • Chromatic Number for 3-colorable graph - NP-Hard to approximate with factor 5/3-(i.e. to differentiate between 4 and 3). Can be approximated within O(nlogO(1) n).

More Results Related to PCP • Vertex Cover – Very easy to approximate within a factor of 2. NP-Hard to approximate it within a factor of 4/3. • Max-3-Sat – Known to be approximable within a factor of 8/7. NP-Hard to approximate within a factor of 8/7- for every >0 • Set Cover - NP-Hard to approximate it within a factor of ln n. Cannot be approximated within factor (1-)ln n unless NP  Dtime(nloglogn).

More Results Related to PCP • Maximum Satisfying Linear Sub-System - The problem: Given a linear system Ax=b (A is n x m matrix ) in field F, find the largest number of equations that can be satisfied by some x. • If all equations can be satisfied the problem is in P. • If F=Q NP-Hard to approximate by factor m. Can be approximated in O(m/logm). • If F=GF(q) can be approximated by factor q (even a random assignment gives such a factor). NP-Hard to approximate within q-. Also NP-Hard for equations with only 3 variables. • For equations with only 2 variables. NP-Hard to approximated within 1.0909 but can be approximated within 1.383

Introduction to PCP