540 likes | 662 Views
Probability (Rosner, chapter 3) KLMED 8004, September 2010 Eirik Skogvoll, Consultant/ Professor. What is probability? Basic probability axioms and rules of calculation. Breast cancer (Example 3.1). Incidence of breast cancer during the next 5 years for women aged 45 to 54
E N D
Probability (Rosner, chapter 3)KLMED 8004, September 2010Eirik Skogvoll, Consultant/ Professor • What is probability? • Basic probability axioms and rules of calculation
Breast cancer (Example 3.1) • Incidence of breast cancer during the next 5 years for women aged 45 to 54 • Group A had their first birth before the age of 20 (“early”) • Group B had their first birth after the age of 30 (“late”) • Suppose 4 out of 1000 in group A, and 5 out of 1000 i group B develop breast cancer over the next 5 years. Is this a chance finding, or does it represent a genuine increased risk? • If the numbers were 40 out of 10 000 and 50 out of 10 000? Still due to chance?
Diagnostic test (Eks 3.26) • Suppose that an automated blood pressure machine classifies 85% of hypertensive patients as hypertensive,23% of normotensive patients as hypertensive, and we know that 20% of the general population are hypertensive. • What is the sensitivity, specificity and positive predictive value of the test?
Probability (Def 3.1) • The sample space, S (N: “utfallsrommet”) is the set of all possible outcomes from an experiment • An experiment is repeated n times.The event A occurs nA times.The relative frequency nA/n approaches a fixed number as the number of experiments (trials) goes towards infinity. This number Pr(A) is called the Probability of A. • This definition is termed frequentist.
How to quantify probability • Empirical estimation: nA/n • Inference/ calculations based on a theoretical/ physical model • ”Subjective” probability ”Probability has no universally accepted interpretation” Chatterjee, S. K. Statistical Thought. A perspective and History. Oxford University Press, 2003. Page 36.
Example: throw a die • Probability of a six is 1/6 • Probability of five or six is 2/6 • These calculations are made under assumptions of fair dice (equal probabiltiy of all outcomes) and certain rules of calculation.
(Very) subjective probability: ”There is hardly any way back, says the UN climate committee. There is a a 50 percent chance that polar meltdown is inevitable, an April report claims. ”The UN climate comittee presented their latest report in January. The committe states that there is a 90 percent chance that global warming is caused by human activity” http://www.aftenposten.no/nyheter/miljo/article1650116.ece (19.02.2007)
http://weather.yahoo.com/accessed 31. August 2010 at 1111 hours Tonight: A steady rain early...then remaining cloudy with a few showers. Low 43F. Winds WNW at 5 to 10 mph. Chance of rain 80%. Rainfall near a quarter of an inch.
Mutually exclusive events (Def 3.2) • Two events A og B are mutually exclusive (N: “disjunkte”) if they cannot both happen at the same time
Expl. 3.7 Diastolic blood pressure (DBP) • A = {DBP 90} • B = {75 DBP 100} • A og B are not mutually exclusive
A B (“A union B”) means that A, or B, or both, occur (Def 3.4).
Example • A = {DBP 90} • B = {75 DBP 100} • A B = {DBP 75}
A B (“Intersection”, N: “Snitt”) means that both A and B occurs (Def. 3.5)
Example • A = {DBP 90} • B = {75 DBP 100} • A B = {90 DBP 100}
Basic rules of probabilityKolmogorov’s axioms (1933, Eq. 3.1) • The probability of an event, E, always satisfies: 0 Pr(E) 1 • If A and B are mutually exclusive, then Pr(A B) = Pr(A) + Pr(B)This also applies to more than 2 events. • The probability of a certain event is 1: Pr(S) = 1
Example (Rosner, expl 3.6, s.47), diastolic BP A: DBP < 90 mmHg (normal). Pr (A) = 0,7 B: 90 DBT < 95 (“borderline”). Pr (B) = 0,1 C: DBT < 95 Pr (C) = Pr(AB) = Pr (A) + Pr (B) = 0,7 + 0,1 = 0,8 Because mutually exclusive
Independent events • “A og B are independent if Pr(B) is not influenced by whether A has happened or not.” • Def 3.7: A and B are independent if Pr(A B) = Pr(A) Pr(B)
The multiplication law of probability(Equation 3.2) • If A1, …, Ak are independent, thenPr(A1 A2 ... Ak) = Pr(A1)Pr(A2)…Pr(Ak)
The addition law of probability (Eq. 3.3) • Pr(AB) = Pr(A) + Pr(B) - Pr(AB) Rosner fig. 3.5, s. 52 Don’t count this set twice!
Example 3.13 and 3.17 A= {Mother’s DBP 95} B = {Father’s DBP 95} Pr (A) = 0,1 Pr (B) = 0,2 Assume independence. What is the probability of being a “hypertensive family”? Pr(AB) = Pr(A)*Pr(B) = 0,1*0,2 = 0,02 What is the probability of at least one parent being hypertensive? Pr (A B) = Pr (A) + Pr (B) - Pr(AB) = 0,1 + 0,2 - 0,02 = 0,28
Addition theorem for 3 events Consider three independent events A, B and C Pr (A B C) = Pr (A) + Pr (B) + Pr (C) - Pr (A B) - Pr (A C) - Pr (B C) + Pr (A B C) A B S C
Conditional probability – Aalen et al. (2006) New cancer within 1 year Age 70-79 year Population 4 000 000 A 15 000 B 300 000 4 500 15 000 P(A) = = 0.38% A = ”This person develops cancer within 1 year” 4 000 000 30 0000 = B = ”The person is 70-79 years old” P(B) 4 000 000
Conditional probability - def 3.9 • Conditional probability of B given A: • We “re-define” the sample space from S to A: • Pr(B|A) = Pr(A B)/Pr(A)
Another look at problem 3.1 +++ A 2 by 2 table of 100 families: Note the difference of (A1 A2) og (A1|A2 ) … (A1 A2) are defined on S (the entire sample space) while (A1|A2) is defined on A2 as the sample space
Dependent events (expl 3.14 →) • A = {Mother’s DBP 95}, • B = {First born child’s DBP 95} • Pr(A) = 0,1 Pr(B) = 0,2 Pr(AB) = 0,05 (known!) • Pr(A)*Pr(B) = 0,1*0,2 = 0,02 Pr(AB) thus: the events are dependent! • Pr(B|A) = Pr(AB)/Pr(A) = 0,05/0,1 = 0,5 Pr(B)
Generalized Multiplication law of probability(Eq 3.8) • From the definition of conditional probability, we have: Pr(AB) = Pr(A)*Pr(B|A) • In general: Pr(A1 A2 ... Ak) = Pr(A1)*Pr(A2|A1)*Pr(A3|A2A1)* …* Pr(Ak|Ak...A2A1)
Total-Probability Rule (Eq 3.7) A2 A1 B Ak
Prevalence • The prevalence of a disease equals the proportion of population that is diseased (def 3.17) • Expl. (Aalen, 1998): • By 31. December 1995, 21 482 Norwegian women suffered from breast cancer. • Total female population: 2 150 000 • Prevalence: 21 482 / 2 150 000 = 0,010 ( 1 %)
Incidence (or incidence rate) • Incidence is a measure of the number of new cases occurring during some time period (i.e. a rate) • Expl (Aalen, 1998): • During 1995, a total of 2 154 Norwegian women were diagnosed with breast cancer • Total female population: 2 150 000 • Incidence rate: 2 154 cases/ (2 150 000 persons * 1 year) = 0,0010 cases per person and year
Prevalence of cataract - expl 3.22 We wish to determine the total prevalence of cataract in the population ≥ 60 years during the next 5 years. Age specific prevalence is known. A = {60 - 64 yrs}, A = {65 - 69 yrs}, A = {70 - 74 yrs}, A = {75+ yrs}, 1 2 3 4 B = {catarakt within 5 år} Pr(A )=0,45, Pr(A )=0,28, Pr(A )=0,20, Pr(A )=0,07 1 2 3 4 Pr(B|A )=0,024, Pr(B|A )=0,046, Pr(B|A )=0,088, Pr(B|A )=0,153 1 2 3 4 k å Pr(B) = Pr (B|Ai)*Pr(Ai) i=1 0.024*0.450+ 046*0.280 +0.088* 0.20+ 0,153*0,070 = 0.052
Eks: Age adjusted incidenc of breast cancer, www.kreftregisteret.no
S B A Bayes’ rule, diagnosis and screening
A = {pos. mammogram} B = {breast cancer within 2 years} Diagnosis of breast cancer (expl 3.23)
S B A Bayes’ rule Definition (Rosner Eq. 3.9)Bayes’ rule/ theorem Combines the expressions of conditional and total probability: We have found one conditional probability by means of the “opposite” or “inverse” conditional probability!
Bayes’ rule Example (Rosner expl. 3.26, s. 61) Prevalence of hypertension = Pr (B) = 0,2. The auto-BP machine classifies 84 % of hypertensive patients and 23 % of normotensive patients as hypertensive. PPV? NPV?
Using a 2*2 table require us to “invent” patients on order to calculate PPV etc. …! With Bayes’ rule this information is utilised directly.
Diagnostics/ ROC Rosner tbl. 3.2 og 3.3, s. 63-64 Criterium “1+”: all rated 1 to 5 are diagnosed as abnormal.We find all the diseased, but identify none as healthy. Sensitivity = 1, spesificity = 0, ‘false positive rate’ = 1.
Diagnostics/ ROC Criterium “2+”: all rated 2 til 5 are diagnosed as abnormal.We find 48/51 diseased, and identify 33/58 as healthy. Sensitivity = 0,94 Specificity = 0,57 ‘False positive rate’ = 0,43
Diagnostics/ ROC Criterium “3+”: all rated 3 to 5 are diagnosed as abnormal. We find 46/51 diseased, and identify 39/58 as healthy.Sensitivity = 0,90 Spesificity = 0,67 ‘False positive rate’ = 0,33
Diagnostics/ ROC Criterium “4+”: all rated 4 and 5 are diagnosed as abnormal. We find 44/51 diseased, and identify 45/58 as healthy.Sensitivity = 0,86 Specificity = 0,78 ‘False positive rate’ = 0,22