680 likes | 830 Views
Complexity Theory Lecture 11. Lecturer: Moni Naor. Recap. Last week: Statistical zero-knowledge AM protocol for VC dimension Hardness and Randomness This Week: Hardness and Randomness Semi-Random Sources Extractors. Derandomization. A major research question:
E N D
Complexity TheoryLecture 11 Lecturer:Moni Naor
Recap Last week: Statistical zero-knowledge AM protocol for VC dimension Hardness and Randomness This Week: Hardness and Randomness Semi-Random Sources Extractors
Derandomization A major research question: • How to make the construction of • Small Sample space `resembling’ large one • Hitting sets Efficient. Successful approach: randomness from hardness • (Cryptographic) pseudo-random generators • Complexity oriented pseudo-random generators
Recall: Derandomization I Collection that should resemble probability of success on ALL inputs Theorem: any f 2BPP has a polynomial size circuit Simulating large sample spaces • Want to find a small collection of strings on which the PTM behaves similarly to the large collection • If the PTM errs with probability at most , then should err on at most + of the small collection • Choose m random strings • For input x event Ax is more than (+) of the m strings fail the PTM Pr[Ax] · e-22m < 2-2n Pr[[x Ax] ·x Pr[Ax] <2n 2-2n=1 Bad Good 1- Chernoff
Pseudo-random generators • Would like to stretch a short secret (seed) into a long one • The resulting long string should be usable in any case where a long string is needed • In particular: cryptographic application as a one-time pad • Important notion: Indistinguishability Two probability distributions that cannot be distinguished • Statistical indistinguishability: distances between probability distributions • New notion: computational indistinguishability
Computational Indistinguishability Definition: two sequences of distributions {Dn} and {D’n} on {0,1}nare computationally indistinguishable if for every polynomial p(n) for every probabilistic polynomial time adversary A for sufficiently large n If A receives input y {0,1}n and tries to decide whether y was generated by Dn or D’n then |Prob[A=‘0’ | Dn ] - Prob[A=‘0’ | D’n ] | < 1/p(n) Without restriction on probabilistic polynomial tests: equivalent to variation distance being negligible ∑β {0,1}n|Prob[ Dn = β] - Prob[ D’n = β]| < 1/p(n) advantage
Pseudo-random generators Definition: a function g:{0,1}* → {0,1}* is said to be a (cryptographic) pseudo-random generator if • It is polynomial time computable • It stretches the input |g(x)|>|x| • denote by ℓ(n) the length of the output on inputs of length n • If the input (seed) is random, then the output is indistinguishable from random For any probabilistic polynomial time adversary A that receives input y of length ℓ(n) and tries to decide whether y= g(x) or is a random string from {0,1}ℓ(n)for any polynomial p(n) and sufficiently large n |Prob[A=`rand’| y=g(x)] - Prob[A=`rand’| yR {0,1}ℓ(n)] | < 1/p(n) Want to use the output a pseudo-random generator whenever long random strings are used Anyone who considers arithmetical methods of producing random numbers is, of course, in a state of sin. J. von Neumann
Pseudo-random generators Definition: a function g:{0,1}* → {0,1}* is said to be a (cryptographic) pseudo-random generator if • It is polynomial time computable • It stretches the input g(x)|>|x| • denote by ℓ(n) the length of the output on inputs of length n • If the input is random the output is indistinguishable from random For any probabilistic polynomial time adversary A that receives input y of length ℓ(n) and tries to decide whether y= g(x) or is a random string from {0,1}ℓ(n)for any polynomial p(n) and sufficiently large n |Prob[A=`rand’| y=g(x)] - Prob[A=`rand’| yR {0,1}ℓ(n)] | < 1/p(n) Important issues: • Why is the adversary bounded by polynomial time? • Why is the indistinguishability not perfect?
Pseudo-Random Generators and Derandomization All possible strings of length k A pseudo-random generator mapping k bits to n bits strings Any input should see roughly the same fraction of accept and rejects The result is a derandomization of a BPP algorithm by taking majority
Complexity of Derandomization • Need to go over all 2k possible input string • Need to compute the pseudo-random generator on those points • The generator has to be secure against non-uniform distinguishers: • The actual distinguisher is the combination of the algorithm and the input • If we want it to work for all inputs we get the non-uniformity
Construction of pseudo-random generatorsrandomness from hardness • Idea: for any given a one-way function there must be a hard decision problem hidden there • If balanced enough: looks random • Such a problem is a hardcore predicate • Possibilities: • Last bit • First bit • Inner product
Hardcore Predicate Definition: let f:{0,1}* → {0,1}* be a function. We say that h:{0,1}* → {0,1} is a hardcore predicate for f if • It is polynomial time computable • For any probabilistic polynomial time adversary A that receives input y=f(x) and tries to compute h(x) for any polynomial p(n) and sufficiently large n |Prob[A(y)=h(x)] - 1/2| < 1/p(n) where the probability is over the choice y and the random coins of A • Sources of hardcoreness: • not enough information about x • not of interest for generating pseudo-randomness • enough information about x but hard to compute it
Single bit expansion • Let f:{0,1}n → {0,1}n be a one-way permutation • Let h:{0,1}n → {0,1} be a hardcore predicate for f Consider g:{0,1}n → {0,1}n+1 where g(x)=(f(x), h(x)) Claim: g is a pseudo-random generator Proof: can use a distinguisher for g to guess h(x) f(x), h(x)) f(x), 1-h(x))
From single bit expansion to many bit expansion Internal Configuration Input Output • Can make r and f(m)(x) public • But not any other internal state • Can make m as large as needed r x f(x) h(x,r) h(f(x),r) f(2)(x) f(3)(x) h(f(2)(x),r) f(m)(x) h(f(m-1)(x),r)
Two important techniques for showing pseudo-randomness • Hybrid argument • Next-bit prediction and pseudo-randomness
Hybrid argument To prove that two distributions D and D’ are indistinguishable: • suggest a collection of distributions D= D0, D1,… Dk =D’ If D and D’ can be distinguished, then there is a pair Di and Di+1 that can be distinguished. Advantage ε in distinguishing between D and D’ means advantage ε/k between someDi and Di+1 Use a distinguisher for the pair Di andDi+1to derive a contradiction
Next-bit Test Definition: a function g:{0,1}* → {0,1}* is said to pass the next bit test if: • It is polynomial time computable • It stretches the input |g(x)|>|x| • denote by ℓ(n) the length of the output on inputs of length n • If the input (seed) is random, then the output passes the next-bit test For any prefix 0≤ i< ℓ(n), for any probabilistic polynomial time adversary A that is a predictor: receives the first i bits of y= g(x) and tries to guess the next bit, for any polynomial p(n) and sufficiently large n |Prob[A(yi,y2,…,yi) = yi+1] – 1/2 | < 1/p(n) Theorem: a function g:{0,1}* → {0,1}* passes the next bit test if and only if it is a pseudo-random generator
Landmark results in the theory of cryptographic pseudo-randomness Theorem: if pseudo-random generators stretching by a single bit exist, then pseudo-random generators stretching by any polynomial factor exist Theorem: if one-way permutations exist, then pseudo-random generators exist A more difficult theorem to prove: Theorem [HILL]: One-way functions exist iff pseudo-random generators exist
Cryptography Only crude upper bound on the time of the `user’ Distinguisher The generator has less computational power than the distinguisher Derandomization when derandomizing an algorithm you have a much better idea about the resources In particular know the run time The generator has more computational power May be from a higher complexity class Complexity Oriented Pseudo-Random Generators Quantifier order switch
Ideas for getting better pseudo-random generators for derandomization • The generator need not be so efficient • For derandomizing • a parallel algorithm generator may be more sequential • Example: to derandomize AC0 circuits can compute parities • A low memory algorithm may use more space In particular we can depart from the one-way function assumption • Easy one-way hard the other • The (in)distinguishing probability need not be so small • We are are going to take a majority at the end
Parameters of a complexity oriented pseudo-random generator All functions of n • Seed length t • Output length m • Running time nc • fooling circuits of size s • errorε Any circuit family {Cn} of sizes(n) that tries to distinguish outputs of the generator from random strings in {0,1}m(n) has at most ε(n) advantage
Hardness Assumption: Unapproximable Functions Definition:E = [k DTIME(2kn) Definition: A family of functions f = {fn} fn:{0,1}n {0,1} is said to be s(n)-unapproximable if for every family of circuits {Cn} of size s(n): Prx[Cn(x) = fn(x)] ≤ ½ + 1/s(n). s(n)is both the circuit size and the bound on the advantage Example: if g is a one-way permutation and h is a hardcore function strong against s(n)-adversaries, then f(y)=h(g-1(y)) is s(n)-unapproximable Average hardness notion
One bit expansion Assumption: f = {fn } is • s(n)-unapproximable, for s(n) = 2Ω(n) • in E Claim:G = {Gn}: Gn(y) = y◦flog n(y) is a single bit expansion generator family Proof: suppose not, then • There exists a predictor that computes flog nwith probability better than ½ + 1/s(log n) on a random inpu Parameters seed lengtht = log n output lengthm=log n + 1 fooling circuits of sizes nδ running time nc errorε =1/nδ < 1/m
Getting Many Bits Simultaneously Try outputting many evaluations of f on various parts of the seed: Let bi(y) be a projection of y and consider G(y) = f(b1(y))◦f(b2(y))◦…◦f(bm(y)) • Seems that a predictor must evaluate f(bi(y)) to predict ith bit • But: predictor might use correlations without having to compute f • Turns out: sufficient to decorrelate the bi(y)’s in a pairwise manner If |y|=t and S µ {1...t} we denote by y|S the sequence of bits of y whose index is in S
Nearly-Disjoint Subsets Definition: a collection of subsets S1,S2,…,Sm {1…t} is an (h, a)-design if: • Fixed size: for alli, |Si| = h • Small intersection: for alli ≠ j, |Si Å Sj| ≤ a {1...t} S2 Each of size h S1 S3 Each intersection of size· a Parameters: (m,t,h,a)
Nearly-Disjoint Subsets Lemma: for every ε > 0 and m < n can construct in poly(n) time a collection of subsets S1,S2,…,Sm {1…t} which is a (h, a)-design with the following parameters: • h = log n, • a = εlog n • t is O(log n). Both a proof of existence and a sequential construction Method of conditional probabilities The constant in the big O depends onε
Nearly-Disjoint Subsets Proof: construction in a greedy manner repeat m times: • pick a random (h=log n)-subset of {1…t} • set t = O(log n) so that: • expected overlap with a fixed Si is ½εlog n • probability overlap with Si is larger than εlog n is at most 1/m • Can get by picking a single element independently from t’ buckets of size ½ ε For Si event Ai is: intersection is larger than (1/2+)t’ Pr[Ai] · e-22t’ < 2-log m • Union bound: some h-subset has the desired small overlap with all the Sipicked so far • find the good h-subset by exhaustive search
Other construction of designs • Based on error correcting codes • Simple construction: based on polynomials
The NW generatorNisan-Wigderson Need: • f Ethat iss(n)-unapproximable, for s(n) = 2δn • Acollection S1,…,Sm {1…t} which is an (h,a)-design withh=log n, a = δlog n/3 and t = O(log n) Gn(y)=flog n(y|S1)◦flog n(y|S2)◦…◦flog n(y|Sm) flog n: 010100101111101010111001010 A subset Si seed y
The NW generator Theorem: G={Gn} is a complexity oriented pseudo-random generator with: • seed length t = O(log n) • output length m = nδ/3 • running time nc for some constant c • fooling circuits of size s =m • errorε = 1/m
The NW generator • Proof: • assume G={Gn}does not -pass a statistical test C = {Cm} of size s: |Prx[C(x) = 1] – Pry[C( Gn(y) ) = 1]| > ε • can transform thisdistinguisher into a predictor A of size s’ = s + O(m): Pry[A(Gn(y)1…i-1) = Gn(y)i] > ½ + ε/m • just as in the next-bit test using a hybrid argument
Proof of the NW generator Gn(y)=flog n(y|S1)◦flog n(y|S2)◦…◦flog n(y|Sm) Pry[A(Gn(y)1…i-1) = Gn(y)i] > ½ + ε/m • fix the bits outside of Si to preserve advantage: Pry’[A(Gn(y’)1…i-1) = Gn(y’)i] > ½ + ε/m andis the assignment to {1…t}\Si maximizing the advantage of A flog n: 010100101111101010111001010 Y’ Si
Proof of the NW generator Gn(y)=flog n(y|S1)◦flog n(y|S2)◦…◦flog n(y|Sm) • Gn(y’)i is exactly flog n(y’) • for j ≠ i, asy’varies,y’|Sj varies over only 2avalues! • From the small intersection property • To compute Gn(y’)j need only a lookup table of size 2a hard-wire (up to) m-1 tables of 2a values to provide all Gn(y’)1…i-1 flog n: 010100101111101010111001010 y ’ Si
The Circuit for computing f There is a small circuit for approximating f from A output flog n(y’) • Properties of the circuit • size: s + O(m) + (m-1)2a • < s(log n) = nδ • advantage • ε/m=1/m2 > 1/s(log n) = n-δ A 1 2 3 i-1 y’ hardwired tables
Extending the result Theorem : if E contains 2Ω(n)-unapproximable functions then BPP = P. • The assumption is an average case one • Based on non-uniformity Improvement: Theorem: If E contains functions that require size 2Ω(n)circuits (for the worst case), then E contains 2Ω(n)-unapproximable functions. Corollary: If E requires exponential size circuits, then BPP = P.
Extracting Randomness from defective sources • Suppose that we have an imperfect source of randomness • physical source • biased, correlated • Collection of events in a computer • /dev/rand Information Leak • Can we: • Extract good random bits from the source • Use the source for various tasks requiring randomness • Probabilistic algorithms
Imperfect Sources • Biased coins: X1, X2…, Xn Each bit is independently chosen so that Pr[Xi =1] = p How to get unbiased coins? Von Neumann’s procedure: Flip the coin twice. • If it comes up ‘0’ followed by ‘1’ call the outcome ‘0’. • If it comes up ‘1’ followed by ‘0’ call the outcome ‘1’. • Otherwise (two ‘0’ ‘s or two ‘1’‘s occurred) repeat the process. Claim: procedure generates an unbiased result, no matter how the coin was biased. Works for all p simultaneously Two questions: Can we get a better rate of generating bits What about more involved or insidious models?
Shannon Entropy Let X be random variable over alphabet with distribution P The Shannon entropy of X is H(X) = - ∑xP(x) log P(x) Where we take 0 log 0 to be 0. Interpretation: represents how much we can compress X under the best encoding
Examples • If X=0 (constant) then H(x) = 0 • Only case where H(X)=0 when X is constant • All other cases H(X) >0 • If X {0,1} and Prob[X=0] = p and Prob[X=1]=1-p, then H(X) = -p log p + (1-p) log (1-p) ≡ H(p) • If X {0,1}n and is uniformly distributed, then H(X) = - ∑ x {0,1}n1/2n log 1/2n =2n/2n n = n
Properties of Entropy • Entropy is bounded: H(X) ≤ log | | • equality occurs only if X is uniform over • For any function f:{0,1}* {0,1}* H(f(X)) ≤ log H(X) • H(X) Is an upper bound on the number of bits we can deterministically extract from X
Does High Entropy Suffice for extraction? • If we have a source on X {0,1}n where X has high entropy (say H(X) ≥ n/2 ), how many bits can we guarantee to extract? • Consider: • Pr[X=0n ] = 1/2 • For any x1{0,1} n-1 Pr[ X=x ] = 1/2n Then H(X) = n/2+1/2 But cannot guarantee more than a single bit in the extraction
Another Notion: Min Entropy Let X be random variable over alphabet with distribution Px The min entropy of X is Hmin(X) = - log max x P(x) The min entropy represents the most likely value of X • min-entropy k implies: • no string has weight more than 2-k Property:Hmin(X) ≤ H(X) Why? Would like to extract k bits from a min entropyk source. Possible ~ If: • we know the source • have unlimited computation power
The Semi-Random modelSanta Vazirani Definition: A source emitting a sequence of bitsX1, X2,…, Xn is an SV source with bias if for all1 · i · n and b1, b2 , …, bi 2 {0,1}iwe have: ½ - /2· Pr[Xi = bi | X1 = b1, X2 = b2 …, Xi-1 = bi-1] · ½ + /2 So the next bit has bias at most Clear generalization of a biased coin Motivation: • physical measurements where there are correlations with the history • Distributed imperfect coin generation An SV source has high min entropy: for any string b1, b2 , …, bn Pr[X1 = b1, X2 = b2 …, Xn = bn] · (½ + /2)n
Impossibility of extracting a single bit from SV sources • Would like a procedure similar to von Neumann’s for SV sources. • A function f:{0,1}n {0,1} such that for any SV source with bias we have that f(X1, X2,…, Xn) is more or less balanced. Theorem: For all 2 (0,1], all n and all functions f:{0,1}n {0,1}: there is an SV source of bias such that f(X1, X2,…, Xn) has bias at least.
Proof of impossibility of extractionstrong SV sources Definition: A source emitting a sequence of bits X1, X2,…, Xn is a strong SV source with bias if: for all1·i·n and b1, b2, …, bi-1,bi+1, …, bn 2 {0,1}n-1 we have that the bias of Xi given that X1 = b1, X2 = b2 …, Xi-1 = bi-1,Xi+1 = bi+1, …, Xn = bn is at most This is a restriction: every strong SV source is also an SV source Even the future does not help you to bias Xi too much
Proof of impossibility of extraction–imbalanced sources Definition: A source emitting a sequence of n bitsX=X1, X2,…, Xn is –imbalanced if for all x,y 2 {0,1}nwe have Pr[X=x]/Pr[X=y] · (1+ )/(1 - ) Lemma: Every –imbalanced source is a strong SV source with bias Proof: for all1·i·n and b1, b2, …, bi-1,bi+1, …, bn 2 {0,1}n-1 the –imbalanced property implies that (1 - )/(1 + ) · Pr[X1 = b1, X2 = b2 …, Xi-1 = bi-1 ,Xi=0, Xi+1 = bi+1 …, Xn = bn ]/ Pr[X1 = b1, X2 = b2 …, Xi-1 = bi-1 ,Xi=1, Xi+1 = bi+1 …, Xn = bn ] · (1+ )/(1 - ) which implies strong bias at most
Lemma: for every function f:{0,1}n {0,1} there is a –imbalanced source such that f(X1, X2,…, Xn) has bias at least Proof: there exists a set S µ{0,1}nof size 2n-1 such that f is constant on S. Consider source X: With probability ½ + /2pick a random element in S With probability ½ - /2pick a random element in {0,1}n /S Recall: A source is –imbalanced if for allx,y 2 {0,1}nwe have Pr[X=x]/Pr[X=y] · (1+ )/ (1 - ) Proof of impossibility of extraction S s.t. |S|= 2n-1 f-1(b) b is the majority f-1(1-b)
Extractors • So if extraction from SV is impossible should we simply give up? • No: use randomness! Make sure you are using much less randomness than you are getting out
Extractor • Extractor: a universal procedure for purifying imperfect source: • The function Ext(x,y) should be efficiently computable • truly random seed as “catalyst” • Parameters: (n, k, m, t, ) source string 2kstrings Ext near-uniform seed mbits {0,1}n t bits Truly random
Extractor: Definition (k, ε)-extractor: for all random variables X with min-entropy k: • output fools all tests T: |Prz[T(z) = 1] – Pry 2R{0,1}t, xX[T(Ext(x, y)) = 1]| ≤ ε • distributions Ext(X, Ut)andUm are ε-close (L1 dist ≤ 2ε) Umuniform distribution on :{0,1}m • Comparison to Pseudo-Random Generators • output of PRG should fool all efficient tests • output of extractor should fool all tests