Complexity Theory Lecture 11

Complexity TheoryLecture 11 Lecturer:Moni Naor

Recap Last week: Statistical zero-knowledge AM protocol for VC dimension Hardness and Randomness This Week: Hardness and Randomness Semi-Random Sources Extractors

Derandomization A major research question: • How to make the construction of • Small Sample space `resembling’ large one • Hitting sets Efficient. Successful approach: randomness from hardness • (Cryptographic) pseudo-random generators • Complexity oriented pseudo-random generators

Recall: Derandomization I Collection that should resemble probability of success on ALL inputs Theorem: any f 2BPP has a polynomial size circuit Simulating large sample spaces • Want to find a small collection of strings on which the PTM behaves similarly to the large collection • If the PTM errs with probability at most , then should err on at most + of the small collection • Choose m random strings • For input x event Ax is more than (+) of the m strings fail the PTM Pr[Ax] · e-22m < 2-2n Pr[[x Ax] ·x Pr[Ax] <2n 2-2n=1 Bad  Good 1- Chernoff

Pseudo-random generators • Would like to stretch a short secret (seed) into a long one • The resulting long string should be usable in any case where a long string is needed • In particular: cryptographic application as a one-time pad • Important notion: Indistinguishability Two probability distributions that cannot be distinguished • Statistical indistinguishability: distances between probability distributions • New notion: computational indistinguishability

Computational Indistinguishability Definition: two sequences of distributions {Dn} and {D’n} on {0,1}nare computationally indistinguishable if for every polynomial p(n) for every probabilistic polynomial time adversary A for sufficiently large n If A receives input y  {0,1}n and tries to decide whether y was generated by Dn or D’n then |Prob[A=‘0’ | Dn ] - Prob[A=‘0’ | D’n ] | < 1/p(n) Without restriction on probabilistic polynomial tests: equivalent to variation distance being negligible ∑β  {0,1}n|Prob[ Dn = β] - Prob[ D’n = β]| < 1/p(n) advantage

Pseudo-random generators Definition: a function g:{0,1}* → {0,1}* is said to be a (cryptographic) pseudo-random generator if • It is polynomial time computable • It stretches the input |g(x)|>|x| • denote by ℓ(n) the length of the output on inputs of length n • If the input (seed) is random, then the output is indistinguishable from random For any probabilistic polynomial time adversary A that receives input y of length ℓ(n) and tries to decide whether y= g(x) or is a random string from {0,1}ℓ(n)for any polynomial p(n) and sufficiently large n |Prob[A=`rand’| y=g(x)] - Prob[A=`rand’| yR {0,1}ℓ(n)] | < 1/p(n) Want to use the output a pseudo-random generator whenever long random strings are used Anyone who considers arithmetical methods of producing random numbers is, of course, in a state of sin. J. von Neumann

Pseudo-random generators Definition: a function g:{0,1}* → {0,1}* is said to be a (cryptographic) pseudo-random generator if • It is polynomial time computable • It stretches the input g(x)|>|x| • denote by ℓ(n) the length of the output on inputs of length n • If the input is random the output is indistinguishable from random For any probabilistic polynomial time adversary A that receives input y of length ℓ(n) and tries to decide whether y= g(x) or is a random string from {0,1}ℓ(n)for any polynomial p(n) and sufficiently large n |Prob[A=`rand’| y=g(x)] - Prob[A=`rand’| yR {0,1}ℓ(n)] | < 1/p(n) Important issues: • Why is the adversary bounded by polynomial time? • Why is the indistinguishability not perfect?

Pseudo-Random Generators and Derandomization All possible strings of length k A pseudo-random generator mapping k bits to n bits strings Any input should see roughly the same fraction of accept and rejects The result is a derandomization of a BPP algorithm by taking majority

Complexity of Derandomization • Need to go over all 2k possible input string • Need to compute the pseudo-random generator on those points • The generator has to be secure against non-uniform distinguishers: • The actual distinguisher is the combination of the algorithm and the input • If we want it to work for all inputs we get the non-uniformity

Construction of pseudo-random generatorsrandomness from hardness • Idea: for any given a one-way function there must be a hard decision problem hidden there • If balanced enough: looks random • Such a problem is a hardcore predicate • Possibilities: • Last bit • First bit • Inner product

Hardcore Predicate Definition: let f:{0,1}* → {0,1}* be a function. We say that h:{0,1}* → {0,1} is a hardcore predicate for f if • It is polynomial time computable • For any probabilistic polynomial time adversary A that receives input y=f(x) and tries to compute h(x) for any polynomial p(n) and sufficiently large n |Prob[A(y)=h(x)] - 1/2| < 1/p(n) where the probability is over the choice y and the random coins of A • Sources of hardcoreness: • not enough information about x • not of interest for generating pseudo-randomness • enough information about x but hard to compute it

Single bit expansion • Let f:{0,1}n → {0,1}n be a one-way permutation • Let h:{0,1}n → {0,1} be a hardcore predicate for f Consider g:{0,1}n → {0,1}n+1 where g(x)=(f(x), h(x)) Claim: g is a pseudo-random generator Proof: can use a distinguisher for g to guess h(x) f(x), h(x)) f(x), 1-h(x))

From single bit expansion to many bit expansion Internal Configuration Input Output • Can make r and f(m)(x) public • But not any other internal state • Can make m as large as needed r x f(x) h(x,r) h(f(x),r) f(2)(x) f(3)(x) h(f(2)(x),r) f(m)(x) h(f(m-1)(x),r)

Two important techniques for showing pseudo-randomness • Hybrid argument • Next-bit prediction and pseudo-randomness

Hybrid argument To prove that two distributions D and D’ are indistinguishable: • suggest a collection of distributions D= D0, D1,… Dk =D’ If D and D’ can be distinguished, then there is a pair Di and Di+1 that can be distinguished. Advantage ε in distinguishing between D and D’ means advantage ε/k between someDi and Di+1 Use a distinguisher for the pair Di andDi+1to derive a contradiction

Next-bit Test Definition: a function g:{0,1}* → {0,1}* is said to pass the next bit test if: • It is polynomial time computable • It stretches the input |g(x)|>|x| • denote by ℓ(n) the length of the output on inputs of length n • If the input (seed) is random, then the output passes the next-bit test For any prefix 0≤ i< ℓ(n), for any probabilistic polynomial time adversary A that is a predictor: receives the first i bits of y= g(x) and tries to guess the next bit, for any polynomial p(n) and sufficiently large n |Prob[A(yi,y2,…,yi) = yi+1] – 1/2 | < 1/p(n) Theorem: a function g:{0,1}* → {0,1}* passes the next bit test if and only if it is a pseudo-random generator

Landmark results in the theory of cryptographic pseudo-randomness Theorem: if pseudo-random generators stretching by a single bit exist, then pseudo-random generators stretching by any polynomial factor exist Theorem: if one-way permutations exist, then pseudo-random generators exist A more difficult theorem to prove: Theorem [HILL]: One-way functions exist iff pseudo-random generators exist

Cryptography Only crude upper bound on the time of the `user’ Distinguisher The generator has less computational power than the distinguisher Derandomization when derandomizing an algorithm you have a much better idea about the resources In particular know the run time The generator has more computational power May be from a higher complexity class Complexity Oriented Pseudo-Random Generators Quantifier order switch

Ideas for getting better pseudo-random generators for derandomization • The generator need not be so efficient • For derandomizing • a parallel algorithm generator may be more sequential • Example: to derandomize AC0 circuits can compute parities • A low memory algorithm may use more space In particular we can depart from the one-way function assumption • Easy one-way hard the other • The (in)distinguishing probability need not be so small • We are are going to take a majority at the end

Parameters of a complexity oriented pseudo-random generator All functions of n • Seed length t • Output length m • Running time nc • fooling circuits of size s • errorε Any circuit family {Cn} of sizes(n) that tries to distinguish outputs of the generator from random strings in {0,1}m(n) has at most ε(n) advantage

Hardness Assumption: Unapproximable Functions Definition:E = [k DTIME(2kn) Definition: A family of functions f = {fn} fn:{0,1}n {0,1} is said to be s(n)-unapproximable if for every family of circuits {Cn} of size s(n): Prx[Cn(x) = fn(x)] ≤ ½ + 1/s(n). s(n)is both the circuit size and the bound on the advantage Example: if g is a one-way permutation and h is a hardcore function strong against s(n)-adversaries, then f(y)=h(g-1(y)) is s(n)-unapproximable Average hardness notion

One bit expansion Assumption: f = {fn } is • s(n)-unapproximable, for s(n) = 2Ω(n) • in E Claim:G = {Gn}: Gn(y) = y◦flog n(y) is a single bit expansion generator family Proof: suppose not, then • There exists a predictor that computes flog nwith probability better than ½ + 1/s(log n) on a random inpu Parameters seed lengtht = log n output lengthm=log n + 1 fooling circuits of sizes  nδ running time nc errorε =1/nδ < 1/m

Getting Many Bits Simultaneously Try outputting many evaluations of f on various parts of the seed: Let bi(y) be a projection of y and consider G(y) = f(b1(y))◦f(b2(y))◦…◦f(bm(y)) • Seems that a predictor must evaluate f(bi(y)) to predict ith bit • But: predictor might use correlations without having to compute f • Turns out: sufficient to decorrelate the bi(y)’s in a pairwise manner If |y|=t and S µ {1...t} we denote by y|S the sequence of bits of y whose index is in S

Nearly-Disjoint Subsets Definition: a collection of subsets S1,S2,…,Sm {1…t} is an (h, a)-design if: • Fixed size: for alli, |Si| = h • Small intersection: for alli ≠ j, |Si Å Sj| ≤ a {1...t} S2 Each of size h S1 S3 Each intersection of size· a Parameters: (m,t,h,a)

Nearly-Disjoint Subsets Lemma: for every ε > 0 and m < n can construct in poly(n) time a collection of subsets S1,S2,…,Sm {1…t} which is a (h, a)-design with the following parameters: • h = log n, • a = εlog n • t is O(log n). Both a proof of existence and a sequential construction Method of conditional probabilities The constant in the big O depends onε

Nearly-Disjoint Subsets Proof: construction in a greedy manner repeat m times: • pick a random (h=log n)-subset of {1…t} • set t = O(log n) so that: • expected overlap with a fixed Si is ½εlog n • probability overlap with Si is larger than εlog n is at most 1/m • Can get by picking a single element independently from t’ buckets of size ½ ε For Si event Ai is: intersection is larger than (1/2+)t’ Pr[Ai] · e-22t’ < 2-log m • Union bound: some h-subset has the desired small overlap with all the Sipicked so far • find the good h-subset by exhaustive search

Other construction of designs • Based on error correcting codes • Simple construction: based on polynomials

The NW generatorNisan-Wigderson Need: • f  Ethat iss(n)-unapproximable, for s(n) = 2δn • Acollection S1,…,Sm  {1…t} which is an (h,a)-design withh=log n, a = δlog n/3 and t = O(log n) Gn(y)=flog n(y|S1)◦flog n(y|S2)◦…◦flog n(y|Sm) flog n: 010100101111101010111001010 A subset Si seed y

The NW generator Theorem: G={Gn} is a complexity oriented pseudo-random generator with: • seed length t = O(log n) • output length m = nδ/3 • running time nc for some constant c • fooling circuits of size s =m • errorε = 1/m

The NW generator • Proof: • assume G={Gn}does not -pass a statistical test C = {Cm} of size s: |Prx[C(x) = 1] – Pry[C( Gn(y) ) = 1]| > ε • can transform thisdistinguisher into a predictor A of size s’ = s + O(m): Pry[A(Gn(y)1…i-1) = Gn(y)i] > ½ + ε/m • just as in the next-bit test using a hybrid argument

Proof of the NW generator Gn(y)=flog n(y|S1)◦flog n(y|S2)◦…◦flog n(y|Sm) Pry[A(Gn(y)1…i-1) = Gn(y)i] > ½ + ε/m • fix the bits outside of Si to preserve advantage: Pry’[A(Gn(y’)1…i-1) = Gn(y’)i] > ½ + ε/m  andis the assignment to {1…t}\Si maximizing the advantage of A flog n: 010100101111101010111001010 Y’ Si  

Proof of the NW generator Gn(y)=flog n(y|S1)◦flog n(y|S2)◦…◦flog n(y|Sm) • Gn(y’)i is exactly flog n(y’) • for j ≠ i, asy’varies,y’|Sj varies over only 2avalues! • From the small intersection property • To compute Gn(y’)j need only a lookup table of size 2a hard-wire (up to) m-1 tables of 2a values to provide all Gn(y’)1…i-1 flog n: 010100101111101010111001010 y ’ Si  

The Circuit for computing f There is a small circuit for approximating f from A output flog n(y’) • Properties of the circuit • size: s + O(m) + (m-1)2a • < s(log n) = nδ • advantage • ε/m=1/m2 > 1/s(log n) = n-δ A 1 2 3 i-1 y’ hardwired tables

Extending the result Theorem : if E contains 2Ω(n)-unapproximable functions then BPP = P. • The assumption is an average case one • Based on non-uniformity Improvement: Theorem: If E contains functions that require size 2Ω(n)circuits (for the worst case), then E contains 2Ω(n)-unapproximable functions. Corollary: If E requires exponential size circuits, then BPP = P.

Extracting Randomness from defective sources • Suppose that we have an imperfect source of randomness • physical source • biased, correlated • Collection of events in a computer • /dev/rand Information Leak • Can we: • Extract good random bits from the source • Use the source for various tasks requiring randomness • Probabilistic algorithms

Imperfect Sources • Biased coins: X1, X2…, Xn Each bit is independently chosen so that Pr[Xi =1] = p How to get unbiased coins? Von Neumann’s procedure: Flip the coin twice. • If it comes up ‘0’ followed by ‘1’ call the outcome ‘0’. • If it comes up ‘1’ followed by ‘0’ call the outcome ‘1’. • Otherwise (two ‘0’ ‘s or two ‘1’‘s occurred) repeat the process. Claim: procedure generates an unbiased result, no matter how the coin was biased. Works for all p simultaneously Two questions: Can we get a better rate of generating bits What about more involved or insidious models?

Shannon Entropy Let X be random variable over alphabet with distribution P The Shannon entropy of X is H(X) = - ∑xP(x) log P(x) Where we take 0 log 0 to be 0. Interpretation: represents how much we can compress X under the best encoding

Examples • If X=0 (constant) then H(x) = 0 • Only case where H(X)=0 when X is constant • All other cases H(X) >0 • If X {0,1} and Prob[X=0] = p and Prob[X=1]=1-p, then H(X) = -p log p + (1-p) log (1-p) ≡ H(p) • If X {0,1}n and is uniformly distributed, then H(X) = - ∑ x  {0,1}n1/2n log 1/2n =2n/2n n = n

Properties of Entropy • Entropy is bounded: H(X) ≤ log |  | • equality occurs only if X is uniform over  • For any function f:{0,1}* {0,1}* H(f(X)) ≤ log H(X) • H(X) Is an upper bound on the number of bits we can deterministically extract from X

Does High Entropy Suffice for extraction? • If we have a source on X {0,1}n where X has high entropy (say H(X) ≥ n/2 ), how many bits can we guarantee to extract? • Consider: • Pr[X=0n ] = 1/2 • For any x1{0,1} n-1 Pr[ X=x ] = 1/2n Then H(X) = n/2+1/2 But cannot guarantee more than a single bit in the extraction

Another Notion: Min Entropy Let X be random variable over alphabet  with distribution Px The min entropy of X is Hmin(X) = - log max x  P(x) The min entropy represents the most likely value of X • min-entropy k implies: • no string has weight more than 2-k Property:Hmin(X) ≤ H(X) Why? Would like to extract k bits from a min entropyk source. Possible ~ If: • we know the source • have unlimited computation power

The Semi-Random modelSanta Vazirani Definition: A source emitting a sequence of bitsX1, X2,…, Xn is an SV source with bias if for all1 · i · n and b1, b2 , …, bi 2 {0,1}iwe have: ½ - /2· Pr[Xi = bi | X1 = b1, X2 = b2 …, Xi-1 = bi-1] · ½ + /2 So the next bit has bias at most  Clear generalization of a biased coin Motivation: • physical measurements where there are correlations with the history • Distributed imperfect coin generation An SV source has high min entropy: for any string b1, b2 , …, bn Pr[X1 = b1, X2 = b2 …, Xn = bn] · (½ + /2)n

Impossibility of extracting a single bit from SV sources • Would like a procedure similar to von Neumann’s for SV sources. • A function f:{0,1}n {0,1} such that for any SV source with bias  we have that f(X1, X2,…, Xn) is more or less balanced. Theorem: For all 2 (0,1], all n and all functions f:{0,1}n {0,1}: there is an SV source of bias  such that f(X1, X2,…, Xn) has bias at least.

Proof of impossibility of extractionstrong SV sources Definition: A source emitting a sequence of bits X1, X2,…, Xn is a strong SV source with bias  if: for all1·i·n and b1, b2, …, bi-1,bi+1, …, bn 2 {0,1}n-1 we have that the bias of Xi given that X1 = b1, X2 = b2 …, Xi-1 = bi-1,Xi+1 = bi+1, …, Xn = bn is at most  This is a restriction: every strong SV source is also an SV source Even the future does not help you to bias Xi too much

Proof of impossibility of extraction–imbalanced sources Definition: A source emitting a sequence of n bitsX=X1, X2,…, Xn is –imbalanced if for all x,y 2 {0,1}nwe have Pr[X=x]/Pr[X=y] · (1+ )/(1 - ) Lemma: Every –imbalanced source is a strong SV source with bias  Proof: for all1·i·n and b1, b2, …, bi-1,bi+1, …, bn 2 {0,1}n-1 the –imbalanced property implies that (1 - )/(1 + ) · Pr[X1 = b1, X2 = b2 …, Xi-1 = bi-1 ,Xi=0, Xi+1 = bi+1 …, Xn = bn ]/ Pr[X1 = b1, X2 = b2 …, Xi-1 = bi-1 ,Xi=1, Xi+1 = bi+1 …, Xn = bn ] · (1+ )/(1 - ) which implies strong bias at most 

Lemma: for every function f:{0,1}n {0,1} there is a –imbalanced source such that f(X1, X2,…, Xn) has bias at least  Proof: there exists a set S µ{0,1}nof size 2n-1 such that f is constant on S. Consider source X: With probability ½ + /2pick a random element in S With probability ½ - /2pick a random element in {0,1}n /S Recall: A source is –imbalanced if for allx,y 2 {0,1}nwe have Pr[X=x]/Pr[X=y] · (1+ )/ (1 - ) Proof of impossibility of extraction S s.t. |S|= 2n-1 f-1(b) b is the majority f-1(1-b)

Extractors • So if extraction from SV is impossible should we simply give up? • No: use randomness! Make sure you are using much less randomness than you are getting out

Extractor • Extractor: a universal procedure for purifying imperfect source: • The function Ext(x,y) should be efficiently computable • truly random seed as “catalyst” • Parameters: (n, k, m, t, ) source string 2kstrings Ext near-uniform seed mbits {0,1}n t bits Truly random

Extractor: Definition (k, ε)-extractor: for all random variables X with min-entropy k: • output fools all tests T: |Prz[T(z) = 1] – Pry 2R{0,1}t, xX[T(Ext(x, y)) = 1]| ≤ ε • distributions Ext(X, Ut)andUm are ε-close (L1 dist ≤ 2ε) Umuniform distribution on :{0,1}m • Comparison to Pseudo-Random Generators • output of PRG should fool all efficient tests • output of extractor should fool all tests

Complexity Theory Lecture 11

Complexity Theory Lecture 11

Presentation Transcript

Tracing Complexity Theory

Complexity Theory

CS151 Complexity Theory

CS151 Complexity Theory

Segment: Computational game theory Lecture 1b: Complexity

CS151 Complexity Theory

Complexity Theory

Complexity Theory Lecture 1

CS151 Complexity Theory

Complexity Theory Lecture 2

CS151 Complexity Theory

CS151 Complexity Theory

CS151 Complexity Theory

CS151 Complexity Theory

CS151 Complexity Theory

CS151 Complexity Theory

CS151 Complexity Theory

Complexity Theory Lecture 10

Complexity Theory Lecture 12

CS151 Complexity Theory

CS151 Complexity Theory