Probability Theory

Probability Theory Rosen 5th ed., Chapter 5

Random Experiment - Terminology • A random (or stochastic) experiment is an experiment whose result is not certain in advance. • An outcome is the result of a random experiment. • The sample spaceS of the experiment is the set of possible outcomes. • An eventE is a subset of the sample space.

Examples • Experiment: roll two dice • Possible Outcomes: (1,2), (5,6) etc. • Sample space: {(1,1) (1,2) … (6,6)} (36 possible outcomes) • Event: sum is 7 {(1,6) (6,1) (2,5) (5,2) (3,4) (4,3)}

Probability • The probabilityp = Pr[E]  [0,1] of an event E is a real number representing our degree of certainty that E will occur. • If Pr[E] = 1, then E is absolutely certain to occur, • If Pr[E] = 0, then E is absolutely certain notto occur,. • If Pr[E] = ½, then we are completely uncertain about whether E will occur; • What about other cases?

Four Definitions of Probability • Several alternative definitions of probability are commonly encountered: • Frequentist, Bayesian, Laplacian, Axiomatic • They have different strengths & weaknesses. • Fortunately, they coincide and work well with each other in most cases.

Probability: Laplacian Definition • First, assume that all outcomes in the sample space are equally likely. • Then, the probability of event E is given by: Pr[E] = |E| / |S|

Example • Experiment: roll two dice • Sample space S: {(1,1) (1,2) … (6,6)} (36 possible outcomes) • Event E: sum is 7 {(1,6) (6,1) (2,5) (5,2) (3,4) (4,3)} P(E) = |E| / |S| = 6 / 36 = 1/6

Another Example • What is the probability of wining the lottery assuming that you have to pick correctly 6 numbers out of 40 numbers?

Probability of Complementary Events • Let E be an event in a sample space S. • Then, E represents the complementary event: Pr[E] = 1-Pr[E] • Proof: Pr[E]=(|S|-|E|)/|S| = 1 – |E|/|S| = 1 − Pr[E]

Example • A sequence of ten bits is randomly generated. What is the probability that at least one of these bits is 0?

Probability of Unions of Events • Let E1,E2 S, then: Pr[E1 E2] = Pr[E1] + Pr[E2] − Pr[E1E2] • By the inclusion-exclusion principle.

Example • What is the probability that a positive integer selected at random from the set of positive integers not exceeding 100 is divisible by either 2 or 5?

Mutually Exclusive Events • Two events E1, E2 are called mutually exclusive if they are disjoint: E1E2 =  • Note that two mutually exclusive events cannot both occur in the same instance of a given experiment. • If event E1 happens, then event E2 cannot, or vice-versa. • For mutually exclusive events, Pr[E1  E2] = Pr[E1] + Pr[E2].

Example • Experiment: Rolling a die once: • Sample space S = {1,2,3,4,5,6} • E1 = 'observe an odd number' = {1,3,5} • E2 = 'observe an even number' = {2,4,6} • E1E2 = , so E1 and E2 are mutually exclusive.

Probability: Axiomatic Definition • How to study probabilities of experiments where outcomes may not be equally likely? • Let S be the sample space of an experiment with a finite or countable number of outcomes. • Define a function p:S→[0,1], called a probability distribution. • Assign a probability p(s) to each outcome s where p(s)= (# times s occurs) / (# times the experiment is performed) as # of times  ∞

Probability: Axiomatic Definition • Two conditions must be met: • 0 ≤ p(s) ≤ 1 for all outcomes sS. • ∑ p(s) = 1 (sum is over all sS). • The probability of any event ES is just:

Example • Suppose a die is biased (or loaded) so that 3 appears twice as often as each other number but that the other five outcomes are equally likely. What is the probability that an odd number appears when we roll the die?

Properties (same as before …) Pr[E] = 1 − Pr[E] Pr[E1 E2] = Pr[E1] + Pr[E2] − Pr[E1E2] • For mutually exclusive events, Pr[E1  E2] = Pr[E1] + Pr[E2].

Independent Events • Two events E,F are independent if the outcome of event E, has no effect on the outcome of event F. Pr[EF] = Pr[E]·Pr[F]. • Example: Flip a coin, and roll a die. Pr[ quarter is heads  die is 1 ] = Pr[quarter is heads] × Pr[die is 1]

Example • Are the events E=“a family with three children has children of both sexes”, and F=“a family with three children has at most one boy” independent?

Mutually Exclusive vs Independent Events • If A and B are mutually exclusive, they cannot be independent. • If A and B are independent, they cannot be mutually exclusive.

Conditional Probability • Then, the conditional probabilityof E given F, written Pr[E|F], is defined as Pr[E|F] = Pr[EF]/Pr[F] • P(Cavity) = 0.1 - In the absence of any other information, there is a 10% chance that somebody has a cavity. • P(Cavity/Toothache) = 0.8 - There is a 80% chance that somebody has a cavity given that he has a toothache.

Example • A bit string of length four is generated at random so that each of the 16 bit strings of length four is equally likely. What is the probability that it contains at least two consecutive 0s, given that its first bit is a 0?

Conditional Probability (cont’d) • Note that if E and F are independent, then Pr[E|F] = Pr[E] • Using Pr[E|F] = Pr[EF]/Pr[F] we have: Pr[EF] = Pr[E|F] Pr[F] Pr[EF] = Pr[F|E] Pr[E] • Combining them we get the Bayes rule:

Bayes’s Theorem • Useful for computing the probability that a hypothesis H is correct, given data D: • Extremely useful in artificial intelligence apps: • Data mining, automated diagnosis, pattern recognition, statistical modeling, evaluating scientific hypotheses.

Example • Consider the probability of Disease given Symptoms P(Disease/Symptoms) = P(Symptoms/Disease)P(Disease) / P(Symptoms)

Random Variable • A random variableVis a function (i.e., not a variable) that assigns a value to each event !

Why Using Random Variables? • It is easier to deal with a summary variable than with the original probability structure. • Example: in an opinion poll, we ask 50 people whether agree or disagree with a certain issue. • Suppose we record a "1" for agree and "0" for disagree. • The sample space for this experiment has possible elements. • Suppose we are only interested in the number of people who agree. • Define the variable X=“number of 1’s recorded out of 50” • Easier to deal with this sample space - has only 51 elements!

Random Variables • How is the probability function of a r.v. being defined from the probability function of the original sample space? • Suppose the original sample space is: S={ } • Suppose the range of the r.v. X is: { } • We will observe X= iff the outcome of the random experiment is an such that X( )=

Example • X = “sum of dice” • X=5 corresponds to E={(1,4) (4,1) (2,3) (3,2)} P(X=5)=P(E)= = P((1,4) + P((4,1)) + P((2,3)) + P((3,2)) = 4/36=1/9

Expectation Values • For a random variable X having a numeric domain, its expected value or weighted average value or arithmetic mean valueE[X] is defined as • Provides a central point for the distribution of values of the r.v.

Example • X=“outcome of rolling a die” E(X) = ∑ x·P(X=x) = 1· 1/6 + 2 ·1/6 + 3· 1/6 + 4 ·1/6 + 5 ·1/6 + 6· 1/6 = 3.5

Another Example • X=“sum of two dice” P(X=2)=P(X=12)=1/36 P(X=3)=P(X=11)=2/36 P(X=4)=P(X=10)=3/36 P(X=5)=P(X=9)=4/36 P(X=6)=P(X=8)=5/36 P(X=7)=6/36 E(X) = ∑ x·P(X=x) =2·1/36 + 3·2/36 + …. + 12· 1/36 = 7

Linearity of Expectation • Let X1, X2 be any two random variables derived from the same sample space, then: • E[X1+X2] = E[X1] + E[X2] • E[aX1 + b] = aE[X1] + b

Example • Find the expected value of the sum of the numbers that appear when a pair of fair dice is rolled. X1: “number appearing on first die” X2: “number appearing on second die” Y = X1 + X2 : “sum of dice” E(Y) = E(X1+X2)=E(X1)+E(X2) = 3.5+3.5=7

Variance • The varianceVar[X] = σ2(X) of a r.v. X is defined as: where • The standard deviation or root-mean-square (RMS) difference of X, σ(X) :≡ Var[X]1/2. • Indicates how spread out the values of the r.v. are.

Example • X=“outcome of rolling a die” E(X) = ∑ x·P(X=x) = 1· 1/6 + 2 ·1/6 + 3· 1/6 + 4 ·1/6 + 5 ·1/6 + 6· 1/6 = 3.5 =

Properties of Variance • Let X1, X2 be any two independent random variables derived from the same sample space, then: Var[X1+X2] = Var[X1] + Var[X2]

Example • Suppose two dice are rolled and X is a r.v. X = “sum of dice” • Find Var(X)

Probability Theory