Discrete Math CS 2800

Discrete MathCS 2800 Prof. Bart Selman selman@cs.cornell.edu Module Probability --- Part d) 1) Probability Distributions 2) Markov and Chebyshev Bounds

Discrete Random variable • Discrete random variable • Takes on one of a finite (or at least countable) number of different values. • X = 1 if heads, 0 if tails • Y = 1 if male, 0 if female (phone survey) • Z = # of spots on face of thrown die

Continuous Random variable • Continuous random variable (r.v.) • Takes on one in an infinite range of different values • W = % GDP grows (shrinks?) this year • V = hours until light bulb fails • For a discrete r.v., we have Prob(X=x), i.e., the probability that • r.v. X takes on a given value x. • What is the probability that a continuous r.v. takes on a specific value? E.g. Prob(X_light_bulb_fails = 3.14159265 hrs) = ?? • However, ranges of values can have non-zero probability. • E.g. Prob(3 hrs <= X_light_bulb_fails <= 4 hrs) = 0.1 • Ranges of values have a probability 0

Probability Distribution • The probability distribution is a complete probabilistic description of a random variable. • All other statistical concepts (expectation, variance, etc) are derived from it. • Once we know the probability distribution of a random variable, we know everything we can learn about it from statistics.

Probability Distribution • Probability function • One form the probability distribution of a discrete random variable may be expressed in. • Expresses the probability that X takes the value x as a function of x (as we saw before):

Probability Distribution • The probability function • May be tabular:

Probability Distribution • The probability function • May be graphical: .50 .33 .17 1 2 3

Probability Distribution • The probability function • May be formulaic:

Probability Distribution: Fair die .50 .33 .17 1 2 3 4 5 6

Probability Distribution • The probability function, properties

Cumulative Probability Distribution • Cumulative probability distribution • The cdf is a function which describes the probability that a random variable does not exceed a value. Yes! Does this make sense for a continuous r.v.?

Cumulative Probability Distribution • Cumulative probability distribution • The relationship between the cdf and the probability function:

Cumulative Probability Distribution • Die-throwing tabular graphical 1 1 2 3 4 5 6

Cumulative Probability Distribution • The cumulative distribution function • May be formulaic (die-throwing):

Cumulative Probability Distribution • The cdf, properties

Example CDFs Of a discrete probability distribution Of a continuous probability distribution Of a distribution which has both a continuous part and a discrete part.

Functions of a random variable • It is possible to calculate expectations and variances of functions of random variables

x P(X=x) Product 1 1 1/6 0.167 2 1.414 1/6 0.236 3 1.732 1/6 0.289 4 2 1/6 0.333 5 2.231 1/6 0.372 6 2.449 1/6 0.408 Tot 1.804 Functions of a random variable • Example • You are paid a number of dollars equal to the square root of the number of spots on a die. • What is a fair bet to get into this game?

Functions of a random variable • Linear functions • If a and b are constants and X is a random variable • It can be shown that: Intuitively, why does b not appear in variance? And, why a2 ?

The Most Common • Discrete Probability Distributions • (some discussed before) 1) --- Bernoulli distribution 2) --- Binomial 3) --- Geometric 4) --- Poisson

Bernoulli distribution • The Bernoulli distribution is the “coin flip” distribution. • X is Bernoulli if its probability function is: • X=1 is usually interpreted as a “success.” E.g.: • X=1 for heads in coin toss • X=1 for male in survey • X=1 for defective in a test of product • X=1 for “made the sale” tracking performance

Bernoulli distribution • Expectation: • Variance:

Binomial distribution • The binomial distribution is just n independent Bernoullis • added up. • It is the number of “successes” in n trials. • If Z1, Z2, …, Zn are Bernoulli, then X is binomial:

Binomial distribution • The binomial distribution is just n independent Bernoullis • added up. Testing for defects “with replacement.” • Have many light bulbs • Pick one at random, test for defect, put it back • Pick one at random, test for defect, put it back • If there are many light bulbs, do not have to replace

Binomial distribution • Let’s figure out a binomial r.v.’s probability function. • Suppose we are looking at a binomial with n=3. • We want P(X=0): • Can happen one way: 000 • (1-p)(1-p)(1-p) = (1-p)3 • We want P(X=1): • Can happen three ways: 100, 010, 001 • p(1-p)(1-p)+(1-p)p(1-p)+(1-p)(1-p)p = 3p(1-p)2 • We want P(X=2): • Can happen three ways: 110, 011, 101 • pp(1-p)+(1-p)pp+p(1-p)p = 3p2(1-p) • We want P(X=3): • Can happen one way: 111 • ppp = p3

Binomial distribution • So, binomial r.v.’s probability function

Binomial distribution • Typical shape of binomial: • Symmetric

Variance: • Expectation: Aside: If V(X) = V(Y). And? But Hmm…

Binomial distribution • A salesman claims that he closes the deal 40% of the time. • This month, he closed 1 out of 10 deals. • How likely is it that he did 1/10 or worse given his claim?

Binomial distribution Less than 5% or 1 in 20. So, it’s unlikely that his success rate is 0.4. Note:

Binomial and normal / Gaussian distribution The normal distribution is a good approximation to the binomial distribution. (“large” n, small skew.) B(n, p) Prob. density function:

Geometric Distribution • A geometric distribution is usually interpreted as number of time periods until a failure occurs. • Imagine a sequence of coin flips, and the random variable X is the flip number on which the first tails occurs. • The probability of a head (a success) is p.

Geometric • Let’s find the probability function for the geometric distribution: etc. So, (x is a positive integer)

Geometric • Notice, there is no upper limit on how large X can be • Let’s check that these probabilities add to 1: Geometric series

Geometric differentiate both sides w.r.t. p: See Rosen page 158, example 17. • Expectation: Variance:

Poisson distribution • The Poisson distribution is typical of random variables which represent counts. • Number of murders in Ithaca next year. • Number of requests to a server in 1 hour. • Number of sick days in a year for an employee. ?!

The Poisson distribution is derived from the following underlying arrival time model: • The probability of an unit arriving is uniform through time. • Two items never arrive at exactly the same time. • Arrivals are independent --- the arrival of one unit does not make the next unit more or less likely to arrive quickly.

Poisson distribution • The probability function for the Poisson distribution with parameter  is: •  is like the arrival rate --- higher means more/faster arrivals

Poisson distribution • Shape Low  Med  High 

Markov and Chebyshev bounds

Often, you don’t know the exact probability distribution • of a random variable. • We still would like to say something about the probabilities involving that random variable… • E.g., what is the probability of X being larger (or smaller) than some given value. • We often can by bounding the probability of events based on partial information about the underlying probability distribution • Markov and Chebyshev bounds.

Note: relates cumulative distribution to expected value. Theorem  Markov Inequality Let X be a nonnegative random variable with E[X] = . Then, for any t > 0, Hmm. What if ? Sure!  “Can’t have too much prob. to the right of E[X]” gives But

Proof I.e. Where did we use X >= 0? 3rd line

A discrete random variable Alt. proof  Markov Inequality Define  E[Y] E[X]

Example: • Consider a system with mean time to failure = 100 hours. • Use the Markov inequality to bound the reliability of the system, • R(t) for t = 90, 100, 110, 200 X – time to failure of the system; E[X]=100 R(t)= P[X>t] , with t =90, 100, 110 , 200 By Markov  Markov inequality is somewhat crude, since only the mean is assumed to be known.

Theorem  Chebyshev's Inequality • Assume that mean and variance are given. • We can obtain a better estimate of probability of • events of interest by using Chebyshev’s inequality:

Theorem  Chebyshev's Inequality Proof: Markov Ineq. applied to r.v.

Chebyshev inequality: Alternate forms • Yet two other forms of Chebyshev’s ineqaulity: Says something about the probability of being “k standard deviations from the mean”.

Theorem  Chebyshev's Inequality

Theorem  Chebyshev's Inequality Facts: 1-1/4 = .75 1-1/9 = .889 1-1/16=0.93

Discrete Math CS 2800