280 likes | 375 Views
Chapter 4. Probability: Studying Randomness. Randomness and Probability. Random: Process where the outcome in a particular trial is not known in advance, although a distribution of outcomes may be known for a long series of repetitions
E N D
Chapter 4 Probability: Studying Randomness
Randomness and Probability • Random: Process where the outcome in a particular trial is not known in advance, although a distribution of outcomes may be known for a long series of repetitions • Probability: The proportion of time a particular outcome will occur in a long series of repetitions of a random process • Independence: When the outcome of one trial does not effect probailities of outcomes of subsequent trials
Probability Models • Probability Model: • Listing of possible outcomes • Probability corresponding to each outcome • Sample Space (S): Set of all possible outcomes of a random process • Event: Outcome or set of outcomes of a random process (subset of S) • Venn Diagram: Graphic description of a sample space and events
Rules of Probability • The probability of an event A, denoted P(A) must lie between 0 and 1 (0 P(A) 1) • For the sample space S, P(S)=1 • Disjoint events have no common outcomes. For 2 disjoint events A and B, P(A or B) = P(A) + P(B) • The complement of an event A is the event that A does not occur, denoted Ac. P(A)+P(Ac) = 1 • The probability of any event A is the sum of the probabilities of the individual outcomes that make up the event when the sample space is finite
Assigning Probabilities to Events • Assign probabilities to each individual outcome and add up probabilities of all outcomes comprising the event • When each outcome is equally likely, count the number of outcomes corresponding to the event and divide by the total number of outcomes • Multiplication Rule: A and B are independent events if knowledge that one occurred does not effect the probability the other has occurred. If A and B are independent, then P(A and B) = P(A)P(B) • Multiplication rule extends to any finite number of events
Example - Casualties at Gettysburg • Results from Battle of Gettysburg Counts Proportions Killed, Wounded, Captured/Missing are considered casualties, what is the probability a randomly selected Northern soldier was a casualty? A Southern soldier? Obtain the distribution across armies
Random Variables • Random Variable (RV): Variable that takes on the value of a numeric outcome of a random process • Discrete RV: Can take on a finite (or countably infinite) set of possible outcomes • Probability Distribution: List of values a random variable can take on and their corresponding probabilities • Individual probabilities must lie between 0 and 1 • Probabilities sum to 1 • Notation: • Random variable: X • Values X can take on: x1, x2, …, xk • Probabilities: P(X=x1) = p1 … P(X=xk) = pk
Example: Wars Begun by Year (1482-1939) • Distribution of Numbers of wars started by year • X = # of wars stared in randomly selected year • Levels: x1=0, x2=1, x3=2, x4=3, x5=4 • Probability Distribution:
Continuous Random Variables • Variable can take on any value along a continuous range of numbers (interval) • Probability distribution is described by a smooth density curve • Probabilities of ranges of values for X correspond to areas under the density curve • Curve must lie on or above the horizontal axis • Total area under the curve is 1 • Special case: Normal distributions
Means and Variances of Random Variables • Mean: Long-run averagea random variable will take on (also the balance point of the probability distribution) • Expected Value is another term, however we really do not expect that a realization of X will necessarily be close to its mean. Notation: E(X) • Mean of a discrete random variable:
Examples - Wars & Masters Golf m=0.67 m=73.54
Statistical Estimation/Law of Large Numbers • In practice we won’t know m but will want to estimate it • We can select a sample of individuals and observe the sample mean: • By selecting a large enough sample size we can be very confident that our sample mean will be arbitrarily close to the true parameter value • Margin of error measures the upper bound (with a high level of confidence) in our sampling error. It decreases as the sample size increases
Rules for Means • Linear Transformations: a + bX (where a and b are constants): E(a+bX) = ma+bX = a + bmX • Sums of random variables: X + Y (where X and Y are random variables): E(X+Y) = mX+Y = mX + mY • Linear Functions of Random Variables: E(a1X1++anXn) = a1m1+…+anmn where E(Xi)=mi
Example: Masters Golf Tournament • Mean by Round (Note ordering): m1=73.54 m2=73.07 m3=73.76 m4=73.91 Mean Score per hole (18) for round 1: E((1/18)X1) = (1/18)m1 = (1/18)73.54 = 4.09 Mean Score versus par (72) for round 1: E(X1-72) = mX1-72 = 73.54-72= +1.54 (1.54 over par) Mean Difference (Round 1 - Round 4): E(X1-X4) = m1 - m4 = 73.54 - 73.91 = -0.37 Mean Total Score: E(X1+X2+X3+X4) = m1+ m2+ m3+ m4 = = 73.54+73.07+73.76+73.91 = 294.28 (6.28 over par)
Variance of a Random Variable • Variance: Measure of the spread of the probability distribution. Average squared deviation from the mean • Standard Deviation: (Positive) Square Root of Variance Rules for Variances (X, Y RVs a, b constants)
Variance of a Random Variable • Special Cases: • X and Y are independent (outcome of one does not alter the distribution of the other): r = 0, last term drops out • a=b=1 and r = 0 V(X+Y) = sX2 + sY2 • a=1 b= -1 and r = 0 V(X-Y) = sX2 + sY2 • a=b=1 and r0 V(X+Y) = sX2 + sY2 + 2rsXsY • a=1 b= -1 and r0 V(X-Y) = sX2 + sY2 -2rsXsY
Wars & Masters (Round 1) Golf Scores s2=.7362 s = .8580 s2 =9.47 s = 3.08
Masters Scores (Rounds 1 & 4) • m1 = 73.54 m4 = 73.91 s12=9.48 s42=11.95 r=0.24 • Variance of Round 1 scores vs Par: V(X1-72)=s12=9.48 • Variance of Sum and Difference of Round 1 and Round 4 Scores:
General Rules of Probability • Union of set of events: Event that any (at least one) of the events occur • Disjoint events: Events that share no common sample points. If A, B, and C are pairwise disjoint, the probability of their union is: P(A)+P(B)+P(C) • Intersection of two (or more) events: The event that both (all) events occur. • Addition Rule: P(A or B) = P(A)+P(B)-P(A and B) • Conditional Probability: The probability B occurs given A has occurred: P(B|A) • Multiplication Rule (generalized to conditional prob): P(A and B)=P(A)P(B|A)=P(B)P(A|B)
Conditional Probability • Generally interested in case that one event precedes another temporally (but not necessary) • When P(A) > 0 (otherwise is trivial): • Contingency Table: Table that cross-classifies individuals or probabilities across 2 or more event classifications • Tree Diagram: Graphical description of cross-classification of 2 or more events
John Snow London Cholera Death Study • 2 Water Companies (Let D be the event of death): • Southwark&Vauxhall (S): 264913 customers, 3702 deaths • Lambeth (L): 171363 customers, 407 deaths • Overall: 436276 customers, 4109 deaths Note that probability of death is almost 6 times higher for S&V customers than Lambeth customers (was important in showing how cholera spread)
John Snow London Cholera Death Study Contingency Table with joint probabilities (in body of table) and marginal probabilities (on edge of table)
John Snow London Cholera Death Study Death Company .0140 D (.0085) S&V .6072 DC (.5987) .9860 WaterUser .0024 D (.0009) .3928 L DC (.3919) .9976 Tree Diagram obtaining joint probabilities by multiplication rule
Example: Florida lotto • You select 6 distinct digits from 1 to 53 (no replacement) • State randomly draws 6 digits from 1 to 53 • Probability you match all 6 digits: • First state draw: P(match 1st) = 6/53 • Given you match 1st, you have 5 left and state has 52 left: P(match 2nd given matched 1st) = 5/52 • Process continues: P(match 3rd given 1&2) = 4/51 • P(match 4th given 1&2&3) = 3/50 • P(match 5th given 1&2&3&4) = 2/49 • P(match 6th given 1&2&3&4) = 1/48
Bayes’s Rule - Updating Probabilities • Let A1,…,Ak be a set of events that partition a sample space such that (mutually exclusive and exhaustive): • each set has known P(Ai) > 0 (each event can occur) • for any 2 sets Ai and Aj, P(Aiand Aj) = 0 (events are disjoint) • P(A1) + … + P(Ak) = 1 (each outcome belongs to one of events) • If C is an event such that • 0 < P(C) < 1 (C can occur, but will not necessarily occur) • We know the probability will occur given each event Ai: P(C|Ai) • Then we can compute probability of Ai given C occurred:
Northern Army at Gettysburg • Regiments: partition of soldiers (A1,…,A9). Casualty: event C • P(Ai) = (size of regiment) / (total soldiers) = (Column 3)/95369 • P(C|Ai) = (# casualties) / (regiment size) = (Col 4)/(Col 3) • P(C|Ai) P(Ai) = P(Ai and C) = (Col 5)*(Col 6) • P(C)=sum(Col 7) • P(Ai|C) = P(Ai and C) / P(C) = (Col 7)/.2416
Independent Events • Two events A and B are independent if P(B|A)=P(B) and P(A|B)=P(A) , otherwise they are dependent or not independent. • Cholera Example: P(D) = .0094 P(D|S) = .0140 P(D|L) =.0024 Not independent (which firm would you prefer)? • Union Army Example: P(C) = .2416 P(C|A1)=.6046 P(C|A5)=.0156 Not independent: Almost 40 times higher risk for A1