If NP languages are hard on the worst-case then it is easy to find their hard instances

If NP languages are hard on the worst-case then it is easy to find their hard instances Danny Gutfreund, Hebrew U. Ronen Shaltiel, Haifa U. Amnon Ta-Shma, Tel-Aviv U.

Impagliazzo’s worlds

Pseudo-P [IW98,Kab-01] D={Dn} is samplable if there exists S  P s.t. S(1n,Up(n))=Dn L  Pseudop-P if there exists a polynomial-time algorithm A=AL s.t.: For every samplable distribution {Dn}, for every input length n, Pr x  Dn [ A(x)=L(x) ] > p

Distributional complexity (Levin) Def:L  Avgp(n)P, if for every samplable distribution D there exists A=AL,D P s.t. Pr x  Dn [A(x)=L(x)] ≥ p(n)

Heuristica vs. Super-Heuristica Natural for cryptography (and lower-bounds) Heuristica: every (avg-case) solution to some hard problem is bound to a specific distribution. If the distribution changes we need to come up with a new solution. Natural for algorithms Super-Heuristica: once a good heuristic for some NP-complete problem is developed then every new problem just needs to be reduced to it. Natural for complexity, Naturally appears in derandomization (IW98,K01,..)

Connection to cryptography The right hardness for cryptography. E.g., a standard assumption in cryptography is that (FACTORING,D)  AvgBPP Where D is the samplable distribution obtained by sampling primes p,q and outputting N =pq.

A remark about reductions • For the distributional setting one needs to define “approximation preserving reductions” [Levin] • [L86,BDCGL90,Gur90,Gur91] Showed complete problems in DistNP. • [IL90] showed a reductions to the uniform distribution. • For Pseudo-classes any (Cook/Karp) reduction is good. if L reduces to L’ via R then for every samplable D, R(D) is samplable. So SAT  PseudoP  NP  PseudoP

A refined picture

Our main result NP  P  NP  Pseudo2/3+εP Worst-case hardness  weak average-case hardness Also, NP  BPP  NP  Pseudo2/3+εBPP Compare with the open problem: NP  BPP ? NP  Avg 1-1/p(n) BPP

Back to Impagliazzo’s worlds

In words • Super-Heuristica does not exist: if we do well on every samplable distribution we do well on every input. • Heuristics for NP-complete problems will always have to be bound to specific samplable distributions (unless NP is easy on the worst-case).

Main Lemma Given a description of a poly-time algorithm that fails to solve SAT, we can efficiently produce on input 1n up to 3 formulas (of length n) s.t. at least one is hard for the algorithm. We also have a probabilistic version.

Proof - main lemma • We are given a description of DSAT, and we know that DSAT fails to solve SAT. • The idea is to use DSAT to find instances on which it fails. Think of DSAT as an adversary.

First question to DSAT: can you solve SAT on the worst-case? • Write as a formula: (n)={0,1}^n[SAT() DSAT()] • Problem - not an NP formula: (n)={0,1}^n[ ()=true  DSAT()=0]  [ ()=false  DSAT()=1]

Search to decision E.g., starting with a SAT sentence (x1,…,xn) DSAT claims is satisfiable. For each variable try setting xi=0,xi=1, If DSAT says none is satisfiyng, we found a contradiction. Otherwise, chose xi so that DSAT says it is satisfyable. At the end check the assignment is satisfying.

Can SSAT solve SAT on the worst-case? • SSAT has one-sided error: it can’t say “yes” on an unsatisfied formula. • (n)={0,1}^n[ ()=true  SSAT()=no] Notice that (n)  SAT code Notice that we use the of DSAT

DSAT((n)[1…ixi+1…xm])=true, DSAT((n)[1…i0…xm])=false, and, DSAT((n)[1…i1…xm])=false. First question to DSAT: can SSAT solve SAT on the worst-case? Or DSAT((n)[1…m])=false • If DSAT((n))=falseoutput (n). [Note that (n) SAT ] • Otherwise, run the search algorithm on (n) with DSAT. • Case 1: the search algorithm fails. Output the three contradicting statements. Case 2: The search algorithm succeeds. We hold SAT such that SSAT()=false.

Are we done? • We hold  on which SSAT is wrong (SAT but SSAT()=false ) • What we need is a sentence on which DSAT is wrong.

Now work with  • If DSAT((n))=falseoutput (n). [Note that (n) SAT ] • Otherwise, run the search algorithm on (n) with DSAT. • Case 1: the search algorithm fails. Output the three contradicting statements. Case 2: The search algorithm succeeds. SSAT finds a satisfying assignent for . Case 2 never happens, SSAT()=false.

Comments about the reduction Our reduction is • non-black-box (because we use the description of the TM, and the search to decision reduction), and,. • it is adaptive (even if we use parallel search to decision reductions [BDCGL90]). So it does not fall in the categories ruled out by [Vio03,BT03] (for average classes)

Dealing with probabilistic algorithms • If we proceed as before we get: (n)={0,1}^n[ ()=t  Prr[SSAT(,r)=1]<2/3 ] • Problem: (n) is an MA statement. We do not know how to derandomize without unproven assumptions. • Solution: Derandomize using Adelman (BPPP/Poly)

Back to the proof • We replace the formula (n)={0,1}^n[ ()=t  Prr[SSAT(,r)=1]<2/3 ] with a distribution over formulas: (n,r’)={0,1}^n[ ()=t  Prr[SSAT(,r’)=0]<2/3 ] • With very high probability SSAT’(input,r’) behaves like SSAT and the argument continues as before.

A weak Avg version The distribution is more complicated than the algorithms it’s hard for. Thm: Assume NP RP. Let f(n)=n(1) . Then there exists a distribution D samplable in time f(n), such that for every NP-complete languge L, (L,D) Avg 1-1/n^3BPP Remark:the corresponding assertion for deterministic classes can be proved directly by diagonalization.

Why worst-case to avg-case reductions are hard? Thm 2 says that the first is not the problem. Here are two possible exlanations: • An exponential search space. • A weak machine has to beat stronger machines.

Proof of Theorem 2 – cont. Km - the set of all probabilistic TM of description length at most m. We define the distribution D={Dn} as follows: • on input 1n choose uniformly a machine M from Klog(n)run it for (say) nlog(n)steps. • If it didn’t halt, output 0n otherwise, output the output of M (trimmed or padded to length n).

Proof of Theorem 2 – cont. • By Thm 1 for every algorithm A, exists a samplable distribution D that outputs hard instances for it. • With probability at least n-2we choose the machine that generates D, and then with probability > 1/3 we get a hard instance for A.

E.g.: Many reconstruction proofs don’t work, because the reconstructing algorithm can not sample the distribution. Hardness amplification for Pseudo-classes. Reminisicent of hardness amplification for AvgBPP, but: E.g., boosting. If for every samplable distribution the algorithm can find a witness for non-negligble fraction of inputs, then it finds a witness for almost all inputs in any samplable distribution. Many old techniques don’t work. Some techniques work better.

The first two are common ingredients in hardness amplification. We proved NP  BPP  P||,NP  Pseudo ½+ε BPP. Boosting is a kind of replacement to the hard core lemma. Using: [VV,BDCGL90] parallel search to decision reduction, error correcting codes, boosting Open problem: NP  BPP  NP  Pseudo ½+ε BPP

If NP languages are hard on the worst-case then it is easy to find their hard instances

If NP languages are hard on the worst-case then it is easy to find their hard instances

Presentation Transcript

NP-complete and NP-hard problems

NP-complete and NP-hard problems

NP-complete and NP-hard problems

If this is so Easy, Why is it so Hard to Do?

The NP-Hard Complexity Class

Good randomness is hard to find

A Good Man is Hard to Find

“A Good Man is Hard to Find”

“A Good Man is Hard to Find”

NP – HARD

A Good Man Is Hard to Find

P, NP, NP-hard

Is it hard to write?

“A Good Man is Hard to Find”

NP-Hard

Hard To Find

How to Find the Hard-to-Find!

A Good Man Is Hard to Find

Easy, Hard, Impossible!

GOOD INDICATORS ARE HARD TO FIND