Mastering Probability Theory: Concepts and Applications

Great Theoretical Ideas In Computer Science Probability Theory:Great Expectations Lecture 19 CS 15-251

Two Edged Sword • Reasoning in terms of weighted averages is the source of many probability pitfalls, but it is also the source of very powerful mathematical tools.

Finite Probability Distribution • A (finite) probability distribution D is a finite set S of elements, where each element x2S has a positive real weight, proportion, or probability p(x). The weights must satisfy:

Finite Probability Distribution • A (finite) probability distribution D is a finite set S of elements, where each element x2S has a positive real weight, proportion, or probability p(x). • S is often called the sample space.

Finite Probability Distribution • A (finite) probability distribution D is a finite set S of elements, where each element x2S has a positive real weight, proportion, or probability p(x). • Any set E ½ S is called an event. The probability of event E is defined to be

Uniform Distribution • A (finite) probability distribution D is a finite set S of elements, where each element x2S has a positive real probability p(x). • If each element has equal probability, the distribution is said to be uniform.

Functions From Distributions To Distributions • Let D be a probability distributions on a set S. Let f: S -> T be a function. • f(D) denotes a probability distribution on the set T satisfying: • 8 y2T, p(y) = PrD [ { x2 S | f(x) = y } ]

2 Fair Flips X=f(D): 0 --- ¼ 1 --- ½ 2 --- ¼ • D: • ¼ --- 00 • ¼ --- 01 • ¼ --- 10 • ¼ --- 11 f f: {00,01,10,11} -> {0,1,2} counts the number of 1’s

Random Variables • Let D be a probability distributions on a set S. Let f: S -> T be a function. • f(D) denotes a probability distribution on the set T satisfying: • 8 y2T, p(y) = PrD [ { x2 S | f(x) = y } ] • If T ½ Reals, then we say that f(D) is a random variable resulting from the action of f on the underlying distribution D.

Fair Coin Flips • If T ½ Reals, then we say that f(D) is a random variable resulting from the action of f on D. • Let D be the uniform distribution on n-bit stings. Let f:{0,1}n -> Naturals be a function returning the number of 1’s. • f(D) is a distribution on {0,1,2,…,n} where the probability of x is (x choose n)/2n

2 Fair Flips X=f(D): 0 --- ¼ 1 --- ½ 2 --- ¼ • D: • ¼ --- 00 • ¼ --- 01 • ¼ --- 10 • ¼ --- 11 f f: {00,01,10,11} -> {0,1,2} counts the number of 1’s

“Let X be a random variable measuring the height of a randomly selected person in the room.” • Is shorthand for: • Let D be the uniform distribution on people in the room. Let f be function taking a person to his/her height. • X = f(D)

We require that random variables be distributions on real numbers so that we can combine and summarize random variables in mathematically natural ways.

Let X = f(D) and Y = g(D). Define Z = X + Y to be a new random variable h(D), where h(x) = f(x) + g(x). When two random variables are based on the same distribution, we can sum them to obtain a new random variable.

Example: Let D be the uniform distribution of people in the USA. Let f return the length in inches of a person’s left arm. Let g return the length in inches of a person’s right arm. Let X = f(D) and Y= g(D). Z = X + Y is a random variable measuring the combined arm lengths of a random person in the USA.

Example: Let D be the uniform distribution of refrigerators in Pittsburgh kitchens. Let f return the number of apples in the fridge. Let g return the number of oranges in the fridge. Let X = f(D) and Y= g(D). Z = X + Y is a random variable measuring the total apples and oranges in a random Pittsburgh fridge.

More generally, for any two random variables X and Y on the same distribution, we can create a new random variable Z=h(X,Y) for any function h from the Reals to the Reals. Mostly, we will be adding random variables.

EXPECTATION of X=f(D) on the sample space S The expectation of a random variable is defined to be average of its values, each value weighted by its probability of occurring. E[X] = Sa2f(S) a Pr[X=a]

2 Fair Flips X=f(D): 0 --- ¼ 1 --- ½ 2 --- ¼ • D uniform on S={00,01,10,11} • ¼ --- 00 • ¼ --- 01 • ¼ --- 10 • ¼ --- 11 f E[X] = Sa2f(S) a Pr[X=a] = 0( ¼ ) + 1( ½ ) + 2( ¼ ) = 1

EXPECTATION of X=f(D) The expectation of a random variable is defined by a weighted average as follows: each d2D contributes f(d) with weight p(d). E[X] = Sd2S f(d) p(d)

2 Fair Flips X=f(D): 0 --- ¼ 1 --- ½ 2 --- ¼ • D on S: • ¼ --- 00 • ¼ --- 01 • ¼ --- 10 • ¼ --- 11 f E[X] = Sd2S f(d) p(d) = f(00)(¼) + f(01)(¼) + f(10)(¼) + f(11)(¼) = 1

EXPECTATION of X=f(D) with sample space S E[X] = Sd2S f(d) p(d) = Sa2f(S) a Pr[X=a]

Example: X is random variable defined by counting the number of heads when n fair, independent coins are flipped. E[X] = ?

Example: X is random variable defined by counting the number of heads when n fair, independent coins are flipped. E[X] = Sa2f(S) a Pr[X=a] = Sa2{0..n} a (n choose a) / 2n =

Example: X is random variable defined by counting the number of heads when n fair, independent coins are flipped. E[X] = Sa2f(S) a Pr[X=a] = Sa2{0..n} a (n choose a) / 2n = (½)n [n 2n-1] = n/2

Don’t Always Expect The Expected • Let D be the uniform distribution on {0,1,9,10}. • E[D] = 5 • The probability that you ever see a sample close to 5 is zero.

Example: X is random variable defined by counting the number of heads when n bias p, independent coins are flipped. E[X] = Sa2f(S) a Pr[X=a] = Sa2{0..n} a (n choose a) pi(1-p)n-i =

Example: X is random variable defined by counting the number of heads when n bias p, independent coins are flipped. E[X] = Sa2f(S) a Pr[X=a] = Sa2{0..n} a (n choose a) pi(1-p)n-i = Ug! There has to be a better way!

IMPORTANT: E[X+Y] = E[X] + E[y]

E[X+Y] = E[X] + E[y] Proof: E[X] = Sd2Sf(d) p(d) E[Y] = Sd2Sg(d) p(d) E[X+Y] = Sd2S(f(d)+g(d)) p(d)

By induction . . . E[X1 + X2 + … + Xn] = E[X1] + E[X2] + …. + E[Xn]

The expectation of the sum is the sum of the expectations.

We will now explain a powerful way to compute expectations. This is called the indicator variable method.

Example: X is random variable defined by counting the number of heads when n fair, independent coins are flipped. E[X] = Sa2f(S) a Pr[X=a] The method of indicator variables will solve this problem with almost no calculation.

Example: X is random variable defined by counting the number of heads when n fair, independent coins are flipped. E[X] = ? Define n Indicator Variables, Xk = Xkindicates whether the kth flip is heads. By design, the sum of the indicator variables is X. 0, if the kth coin is tails 1, if the kth coin is heads

Example: X is random variable defined by counting the number of heads when n fair, independent coins are flipped. E[X] = ? Define n Indicator Variables, Xk = Sk Xk = X 0, if the kth coin is tails 1, if the kth coin is heads

Example: X is random variable defined by counting the number of heads when n fair, independent coins are flipped. E[X] = ? Define n Indicator Variables, Xk = E[Sk Xk ]= E[ X ] 0, if the kth coin is tails 1, if the kth coin is heads

E[ X ] = E[Sk Xk] = E[X1] + . . . + E[Xn] The expectation of the sum is the sum of the expectations.

E[ X ] = E[Sk Xk] = E[X1] + . . . + E[Xn] Each individual E[Xk] is trivial to calculate: E[Xk] = (½) 0 + (½) 1 = ½

E[ X ] = E[Sk Xk] = E[X1] + . . . + E[Xn] = ½ + ½ + … + ½ = n/2

Example: X is a random variable defined by counting the number of heads when n fair, independent coins are flipped. E[X] = ? Define n Indicator Variables, Xk = E[ X ] =E[Sk Xk] = n/2 0, if the kth coin is tails 1, if the kth coin is heads

Example: X is random variable defined by counting the number of heads when n bias p, independent coins are flipped. E[X] = ? Define n Indicator Variables, Xk = E[ X ] = E[Sk Xk] = ? 0, if the kth coin is tails 1, if the kth coin is heads

E[ X ] = E[Sk Xk] = E[X1] + . . . + E[Xn] Each individual E[Xk] is trivial to calculate: E[Xk] = (1-p) 0 + (p) 1 = p

Example: X is random variable defined by counting the number of heads when n bias p, independent coins are flipped. E[X] = ? Define n Indicator Variables, Xk = E[ X ] = E[Sk Xk] = pn 0, if the kth coin is tails 1, if the kth coin is heads

Exercise Go back to the painful looking sum involving k [n choose k] pk (1-p)n-k and solve it using the fact that E[X] = pn.

The method of indicator variables is even more powerful than we are letting on! The indicator variables do not have to be independent for the method to work. Additivity of expectations does not require independence.

E[X+Y] = E[X] + E[y] Proof: E[X] = Sd2Sf(d) p(d) E[Y] = Sd2Sg(d) p(d) E[X+Y] = Sd2S(f(d)+g(d)) p(d)

The indicator variables do not have to be independent for the method to work. Application: When computing the expected running time of a randomized algorithm, we can compute the expectation of each piece pf the program and sum the results.

Dependent indicator variables arise in the birthday example. Suppose we have k people each with a uniformly chosen birthday from 1 to 365. X=number of pairs of people with the same birthday. E[X] = ?

Mastering Probability Theory: Concepts and Applications