440 likes | 592 Views
Randomization. Seminar on Communication Complexity. Carmella Kroitoru. What ’ s new?. Until now, Alice and Bob were all powerful, but deterministic . But what if Alice and Bob could act in a randomized fashion? What’s randomized fashion??
E N D
Randomization Seminar on Communication Complexity Carmella Kroitoru
What’s new? Until now, Alice and Bob were all powerful, but deterministic. But what if Alice and Bob could act in a randomized fashion? What’s randomized fashion?? Well, just flipping a coin, for instance..
What now?? Now players “toss coins”. The coins determine the protocol. The communication over an input (x, y) is not fixed anymore, it’s a random variable. And so is f(x,y).
So, How can we define the success of the protocol? • The conservative way - Las Vegas protocols Success = the protocol always gives the right f(x,y). • The liberal option – Monte Carlo protocols Success = a high probability that the protocol will give the right f(x,y).
Where do we get all the coins? Alice gets x Bob gets y Alice gets x Bob gets y Alice has r(A) Bob has r(B) F(x,y) depends on x, y, r(A), r(B) F(x,y) depends only on x and y Denote – r(I) is a pseudo random string of arbitrary length
So how does the decision tree looks?? Before Randomization After Randomization
But What if we get the ‘wrong’ r(I)? Does it mean the f(x,y) will be wrong? Yes! For The same input (x,y), f(x,y) will differ, depending on r(A) and r(B). So how will we get the right answer? Through the magic of probability.
Definitions: P – a randomized protocol • Zero error – for every (x,y) Pr [P(x,y) = f(x,y)] = 1 • ε error – for every (x,y) Pr [P(x,y) = f(x,y)] ≥ 1- ε • One sided ε error - for every (x,y) if f(x,y) = 0 then Pr [P(x,y) = 0] = 1 if f(x,y) = 1 then Pr [P(x,y) = 1] ≥ 1- ε
Now f(x,y) can vary depending on r(A) and r(B) What about D(f)?? Is it constant for some (x,y) input? No! So how will we measure the running time?
Running time – first method • The worst case running time of a randomized protocol p is • The worst case cost of p is
Running time – second method • The average case running time of a randomized protocol p is • The average case cost of p is
Average? r(A) and r(B) are chosen independently, according to some probability distribution. Should we consider the distribution of (x,y)? No ‘average input’!
Now what’s all that ‘probability’ talk?Didn’t I pass that course already? Let’s do a quick review..
Binomial Distribution A binomial experiment, also known as Bernoulli trial, is a statistical experiment with the properties: • n Independent trials – yi’s. • Success or failure (denoted 0/1, T/F) • Probability of success, p, is the same on every trial. • Denote: S – number of successes. Example: coin toss. Success = ‘Heads’.
Exercise: Given a box with red and blue ball, suppose red balls are at least δ fraction of total. Probability that draws don’t see any red balls? But what if 1/3 isn’t good enough? What if we want <α? Probability that draws don’t see any red balls?
Expectation • Linearity • E( a ) = a • E( a*X + b) = a*E(X) + b • E( X+Y ) = E(X) + E(Y) • Non-multiplicativity E( X*Y ) = E(X) * E(Y) only if X and Y independent
What is the probability that a random variable deviates from its expectation? • Focus on sums/averages of n bounded variables: • Note: can get similar bounds for other bounded variables
Don’t know anything about the yi’s : If yi is positive, but we don’t know anything else, then we can use Markov’sinequalityto bound S: Some intuition - no more than 1/5th of the population can have more than 5 times the average income Example: – toss 100 coins, Pr (# of heads ≥ 60) ≤ 5/6 (E(Y) = 50, t = 6/5) So without knowing if yi is bound, or if the yi ‘s are independent we got an upper bound. Although it’s not a great bound, and we don’t know the lower bound.
An exact result: Using the Cumulative binomial probability. Refers to the probability that the binomial random variable falls within a specified range Example: toss 100 coins, Pr (# of heads ≥ 60) = ? Solution: To solve this problem, we compute 40 individual probabilities, using the binomial formula. The sum of all these probabilities is the answer we seek. Denote: k = # of heads Pr(k ≥ 60; n = 100, p = 0.5) = Pr(k = 60; 100, 0.5) + Pr(k = 61; 100, 0.5) + . . . + Pr(k = 100; 100, 0.5) Pr(k ≥ 60; n = 100, p = 0.5) = 0.028
More generally: We can use Chernoff’s inequalityto bound S: Then for any δ > 0 Example: toss 100 coins independently Pr(# of heads ≥ 60) ≤ (t = 1/5) • Dramatically better bounds then Markov • Worse then Cumulative Probability but much easier and works for more cases. • Bounds for all S (bigger or smaller then E) Back to running time
3 types of errors - 3 complexity measures (1) Let f : x × y → {0,1} Lets define complexity measures for a randomized protocol p: is the minimum average case cost of p, that computes f with zero error.
3 types of errors - 3 complexity measures (2) For 0< ε <1, is the minimum worst case cost of p that computes f with error ε. For 0< ε <1, is the minimum worst case cost of p that computes f with one sided error ε.
Wait! What’s the meaning of ‘average’ for f with zero error? And the meaning of ‘worst’ for f with error?
Worst case = Θ (Average case) Reminder: DAVG(f) – the average running time over all random vectors r(A), r(B),maxed over all possible inputs x,y. A protocol ‘A’ makes an error ε/2 andDAVG(A) = t Define A’: ~ Execute Aas long as at most 2t/ ε bits are exchanged ~ If A finished, return it’s answer ~ Else return 0 And #{bits exchenged} of A’ in the worst case is DWORST(A’) = 2t/ε So we found k= 2/ε s.t. DAVG(A) = k * DWORST(A’)
But is A and A’ have the same errors? • A has error of ε/2 . • What’s the error of A’? • A’ can return the answer A output. That has a chance of ε/2 to be wrong. • Or A’ can output 0 because A wasn’t done after 2t/ εbits. What’s the chance of that? We’ll use Markov: Pr[A exchanged more then 2t/ ε bits] = Pr[ #{bits exchenged in A}> (2/ε)*t ] ≤ 1/(2/ε) = ε/2 • So A’ has error of at most ε/2 + ε/2 = ε, meaning 2*Aerror = A’error , both one sided.
What if ε = 0? For zero error protocols, using the worst case cost gives exactly the deterministic communication complexity. How come? A deterministic protocol can just fix some r(A) and r(B) and proceed. So for zero error protocols, we only care about the average case cost.
Exercise (1): A: Calculate f(x, y) for some fixed x,y, with some protocol with one sided error ε. A’: Execute A t times. Denote: ~ fi – result of execution #i ~ if for some i fi=1 then res(A’)= 1 , else res(A’)= 0. Pr [res(A’) ≠ f(x, y)] = ? Solution: If res(A’) = 1, that’s the right answer, 100% (by definition). If res(A’) = 0, it might be the right result, or we got the wrong result for t rounds. What’s the chance of that? Pr[ mistake in A] < ε. Pr[ mistake in A’] < ε^t.
Exercise (2): A: Calculate f(x, y) for some fixed x,y, with some protocol with error ε. A’: Execute A t times. Denote: fi – result of execution #i res(A’) = maj {fi} What’s the possibility that A’ didn’t get a result we trust? Solution(1): Define a Bernoulli trial: yi= 1 if fi ≠ f(x, y). To get res(A’) wrong we need to get more than half yi‘s wrong. E[∑ yi ] = E[S] = εt. So what’s the chance of S >t/2 ? Hint: Use Chernoff for that.
Solution (2): Let’s fix ε ≤ ¼ (can be generalized for ε < ½) and take δ = 1. What if we want to bound the possibility of error with α smaller than that? Then we need to take t = 12*ln[1/ α] So the error probability can be reduced if we’ll take bigger t, meaning, enlarge the communication complexity by a small penalty.
Field A non-empty set F with two binary operation + (addition) and * (multiplication) is called a field if it has: • Closure of F under + and * • Associativity of + and * • Commutativity of + and * • Distributivity of * over + • + and * identity (0, 1) • + and * inverses examples: rational numbers, GF[7] = {0, 1, 2, 3, 4, 5, 6}
Example: EQ Denote: Alice’s input A = a0a1…an-1 Bob’s input B = b0b1…bn-1 Let’s think of these inputs as polynomials over GF[p] where n² < p < 2n², p is prime. That is A(x) = a0 + a1x + a2x² + … + an-1x^(n-1) mod p and B(x) = b0 + b1x + b2x² + … + bn-1x^(n-1) mod p Alice picks at random t in GF[p] and sends Bob both t and A(t). Bob outputs 1 if A(t) = B(t) and 0 otherwise. #{bits exchanged} = O(log p) = O(log n)
Correctness Note that if A=B then A(t) = B(t) for all t, so f(A,B) = 1. If A≠B then we have 2 distinct polynomials of degree n-1. Such polynomials can be equal on at most n-1 (out of p) elements of the field (since their difference is a non zero polynomial of degree ≤ n-1, and has at most n-1 roots). Hence the probability of error is at most (n-1)/p ≤ n/n² = 1/n So we have shown that R(EQ) = O(log n), and in fact and In contrast to D(NE) = D(EQ) = n+1
Exercise Prove that the following protocol for EQ achieves similar performance: Alice and Bob view their inputs A and B as n-bit integers (between 1 and 2^n). Alice chooses a random prime number p between the n first primes. (If n = 3 then p is 2, 3 or 5) She sends p and (A mod p) to Bob. Bob outputs 1 if A mod p = B mod p, and 0 otherwise.
Solution First note that if A = B, then of course A mod p = B mod p. If A ≠ B and Bob accepts, it means that A = B (mod p). Define BADA,B = {p | p is in the first n primes and A = B (mod p)} BAD 5,8 = {3} Claim (without proof): | BADA,B | ≤ ½n. Then the probability that B accepts is Can repeat this for 100 times to get error of 2^(-100). So we got since max {first n primes} < n^3 and O(log n) = O(log n^3)
So how much better are bounds of randomization protocols then deterministic ones? Lemma: We will prove a more delicate statement:
Proof 1: Let’s present a deterministic simulation of a given randomized protocol p. For each leaf l of the protocol p, Alice will send Bob p(A, l) – the probability that given x, she will respond in a way leading to the l. Bob will compute p(B, l) - the probability that given the y, he will respond in a way leading to the l. Bob will then compute p(l) = p(A, l) * p(B, l) – the probability to reach l.
Proof 2: Bob and Alice will do this for every leaf, thus calculating all leafs. Bob will check which of the values has probability 1-ε, and that is the right f(x,y). What’s the problem?? We need to send probability values, that means real numbers. But we can use precision of bits.
Proof 3: This guarantees deviation in values of at most This implies that p(l), is at most far from the true p(l) (since p(B, l) ≤ 1). So the total error over all the leaves is at most ½-ε. Therefore Bob only needs to check which of the values (0 or 1) has probability of more then ½ and that is the correct f(x,y) value. Q.E.D.