240 likes | 478 Views
Probability theory. Tron Anders Moger September 5th 2007. Some definitions:. Sample space S=The set of all possible outcomes of a random experiment Event A: Subset of outcomes in the sample space Venn diagram:. Operations on events 1.
E N D
Probability theory Tron Anders Moger September 5th 2007
Some definitions: • Sample space S=The set of all possible outcomes of a random experiment • Event A: Subset of outcomes in the sample space • Venn diagram:
Operations on events 1 • Complement: The complement of A are all outcomes included in the sample space, but not in A, denoted . • Union: The union of two events A and B are the outcomes included in both A and B.
Operations on events 2 • Intersection: The intersection of A and B are the outcomes included in both A and B. • Mutually exclusive: If A and B do not have any common outcomes, they are mutually exclusive. • Collectively exhaustive:
Probability • Probability is defined as the freqency of times an event A will occur, if an experiment is repeated many times • The sum of the probabilities of all events in the sample space sum to 1. • Probability 0: The event cannot occur • Probabilities have to be between 0 and 1!
Probability postulates 1 • The complement rule: P(A)+P( )=1 • Rule of addition for mutually exclusive events: P(AB)=P(A)+P(B)
Probability postulates 2 • General rule of addition, for events that are not mutually exclusive: P(AB)=P(A)+P(B)-P(AB)
Conditional probability • If the event B already has occurred, the conditional probability of A given B is: • Can be interpreted as follows: The knowledge that B has occurred, limit the sample space to B. The relative probabilities are the same, but they are scaled up so that they sum to 1.
Probability postulates 3 • Multiplication rule: For general outcomes A and B: P(AB)=P(A|B)P(B)=P(B|A)P(A) • Indepedence: A and B are statistically independent if P(AB)=P(A)P(B) • Implies that
Probability postulates 4 • Assume that the events A1, A2 ,..., An are independent. Then P(A1A2....An)=P(A1)P(A2)....P(An) This rule is very handy when all P(Ai) are equal
Example: Doping tests • Let’s say a doping test has 0.2% probability of being positive when the athlete is not using steroids • The athlete is tested 50 times • What is the probability that at least one test is positive, even though the athlete is clean? • Define A=at least one test is positive Complement rule Rule of independence 50 terms
Example: Andy’s exams • Define A=Andy passes math • B=Andy passes chemistry • Let P(A)=0.4 P(B)=0.35 P(A∩B)=0.12 • Are A and B independent? 0.4*0.35=0.14≠0.12, no they are not • Probability that Andy fail in both subjects? Complement rule General rule of addition
The law of total probability - twins • A= Twins have the same gender • B= Twins are monozygotic • = Twins are heterozygotic • What is P(A)? • The law of total probability P(A)=P(A|B)P(B)+P(A| )P( ) For twins: P(B)=1/3 P( )=2/3 P(A)=11/3+1/22/3=2/3
Bayes theorem • Frequently used to estimate the probability that a patient is ill on the basis of a diagnostic • Uncorrect diagnoses are common for rare diseases
Example: Cervical cancer • B=Cervical cancer • A=Positive test • P(B)=0.0001 P(A|B)=0.9 P(A| )=0.001 • Only 8% of women with positive tests are ill
Usefullness of test highly dependent on disease prevalence and quality of test: P(B) P(A| ) P(B|A) 0.0001 0.001 0.08 0.0001 0.47 0.001 0.001 0.47 0.0001 0.90 0.01 0.001 0.90 0.0001 0.99
Odds: • The odds for an event is the probability of the event divided by the probability of its complement • From horse racing: Odds 1:9 means that the horse wins in 1 out of 10 races; P(A)=0.1
Random variables • A random variable takes on numerical values determined by the outcome of a random experiment. • A discrete random variable takes on a countable number of values, with a certain probability attached to each specific value. • Continuous random variables can take on any value in an interval, only meaningful to talk about the probability for intervals.
PDF and CDF • For discrete random variables, the probability density function (PDF) is simply the same as the probability function of each outcome, denoted P(x). • The cumulative density function (CDF) at a value x is the cumulative sum of the PDF for values up to and including x, . • Sum over all outcomes is always 1 (why?). • For a single dice throw, the CDF at 4 is 1/6+1/6+1/6+1/6=4/6=2/3
Expected value • The expected value of a discrete random variable is defined as the following sum: • The sum is over all possible values/outcomes of the variable • For a single dice throw, the expected value is E(X)=1*1/6+2*1/6+...+6*1/6=3.5
Properties of the expected value • We can construct a new random variable Y=aX+b from a random variable X and numbers a and b. (When X has outcome x, Y has outcome ax+b, and the probabilities are the same). • We can then see that E(Y) = aE(X)+b • We can also construct for example the random variable X*X = X2
Variance and standard deviation • The variance of a stochastic variable X is • The standard deviation is the square root of the variance. • We can show that • Hence, constants do not have any variance
Example: • Let E(X)=X and Var(X)=X2 • What is the expected value and variance of ?
Next week: • So far: Only considered discrete random variables • Next week: Continuous random variables • Common probability distributions for random variables • Normal distribution