Probability (Rosner, chapter 3) KLMED 8004, September 2010 Eirik Skogvoll, Consultant/ Professor

Probability (Rosner, chapter 3)KLMED 8004, September 2010Eirik Skogvoll, Consultant/ Professor • What is probability? • Basic probability axioms and rules of calculation

Breast cancer (Example 3.1) • Incidence of breast cancer during the next 5 years for women aged 45 to 54 • Group A had their first birth before the age of 20 (“early”) • Group B had their first birth after the age of 30 (“late”) • Suppose 4 out of 1000 in group A, and 5 out of 1000 i group B develop breast cancer over the next 5 years. Is this a chance finding, or does it represent a genuine increased risk? • If the numbers were 40 out of 10 000 and 50 out of 10 000? Still due to chance?

Diagnostic test (Eks 3.26) • Suppose that an automated blood pressure machine classifies 85% of hypertensive patients as hypertensive,23% of normotensive patients as hypertensive, and we know that 20% of the general population are hypertensive. • What is the sensitivity, specificity and positive predictive value of the test?

Probability of male livebirth – expl. 3.2

Probability (Def 3.1) • The sample space, S (N: “utfallsrommet”) is the set of all possible outcomes from an experiment • An experiment is repeated n times.The event A occurs nA times.The relative frequency nA/n approaches a fixed number as the number of experiments (trials) goes towards infinity. This number Pr(A) is called the Probability of A. • This definition is termed frequentist.

How to quantify probability • Empirical estimation: nA/n • Inference/ calculations based on a theoretical/ physical model • ”Subjective” probability ”Probability has no universally accepted interpretation” Chatterjee, S. K. Statistical Thought. A perspective and History. Oxford University Press, 2003. Page 36.

Example: throw a die • Probability of a six is 1/6 • Probability of five or six is 2/6 • These calculations are made under assumptions of fair dice (equal probabiltiy of all outcomes) and certain rules of calculation.

(Very) subjective probability: ”There is hardly any way back, says the UN climate committee. There is a a 50 percent chance that polar meltdown is inevitable, an April report claims. ”The UN climate comittee presented their latest report in January. The committe states that there is a 90 percent chance that global warming is caused by human activity” http://www.aftenposten.no/nyheter/miljo/article1650116.ece (19.02.2007)

http://weather.yahoo.com/accessed 31. August 2010 at 1111 hours Tonight: A steady rain early...then remaining cloudy with a few showers. Low 43F. Winds WNW at 5 to 10 mph. Chance of rain 80%. Rainfall near a quarter of an inch.

Mutually exclusive events (Def 3.2) • Two events A og B are mutually exclusive (N: “disjunkte”) if they cannot both happen at the same time

Expl. 3.7 Diastolic blood pressure (DBP) • A = {DBP  90} • B = {75  DBP  100} • A og B are not mutually exclusive

A  B (“A union B”) means that A, or B, or both, occur (Def 3.4).

Example • A = {DBP  90} • B = {75  DBP  100} • A  B = {DBP  75}

A  B (“Intersection”, N: “Snitt”) means that both A and B occurs (Def. 3.5)

Example • A = {DBP  90} • B = {75  DBP  100} • A  B = {90  DBP  100}

Basic rules of probabilityKolmogorov’s axioms (1933, Eq. 3.1) • The probability of an event, E, always satisfies: 0  Pr(E)  1 • If A and B are mutually exclusive, then Pr(A  B) = Pr(A) + Pr(B)This also applies to more than 2 events. • The probability of a certain event is 1: Pr(S) = 1

Example (Rosner, expl 3.6, s.47), diastolic BP A: DBP < 90 mmHg (normal). Pr (A) = 0,7 B: 90  DBT < 95 (“borderline”). Pr (B) = 0,1 C: DBT < 95 Pr (C) = Pr(AB) = Pr (A) + Pr (B) = 0,7 + 0,1 = 0,8 Because mutually exclusive

Independent events • “A og B are independent if Pr(B) is not influenced by whether A has happened or not.” • Def 3.7: A and B are independent if Pr(A B) = Pr(A) Pr(B)

The multiplication law of probability(Equation 3.2) • If A1, …, Ak are independent, thenPr(A1 A2 ... Ak) = Pr(A1)Pr(A2)…Pr(Ak)

The addition law of probability (Eq. 3.3) • Pr(AB) = Pr(A) + Pr(B) - Pr(AB) Rosner fig. 3.5, s. 52 Don’t count this set twice!

Example 3.13 and 3.17 A= {Mother’s DBP  95} B = {Father’s DBP  95} Pr (A) = 0,1 Pr (B) = 0,2 Assume independence. What is the probability of being a “hypertensive family”? Pr(AB) = Pr(A)*Pr(B) = 0,1*0,2 = 0,02 What is the probability of at least one parent being hypertensive? Pr (A  B) = Pr (A) + Pr (B) - Pr(AB) = 0,1 + 0,2 - 0,02 = 0,28

Addition theorem for 3 events Consider three independent events A, B and C Pr (A  B  C) = Pr (A) + Pr (B) + Pr (C) - Pr (A  B) - Pr (A  C) - Pr (B  C) + Pr (A  B  C) A B S C

Conditional probability – Aalen et al. (2006) New cancer within 1 year Age 70-79 year Population 4 000 000 A 15 000 B 300 000 4 500 15 000 P(A) = = 0.38% A = ”This person develops cancer within 1 year” 4 000 000 30 0000 = B = ”The person is 70-79 years old” P(B) 4 000 000

Conditional probability - def 3.9 • Conditional probability of B given A: • We “re-define” the sample space from S to A: • Pr(B|A) = Pr(A  B)/Pr(A)

Conditional probability and independence

Another look at problem 3.1 +++ A 2 by 2 table of 100 families: Note the difference of (A1  A2) og (A1|A2 ) … (A1  A2) are defined on S (the entire sample space) while (A1|A2) is defined on A2 as the sample space

Relative risk

Relative risk - eks 3.19

Dependent events (expl 3.14 →) • A = {Mother’s DBP  95}, • B = {First born child’s DBP  95} • Pr(A) = 0,1 Pr(B) = 0,2 Pr(AB) = 0,05 (known!) • Pr(A)*Pr(B) = 0,1*0,2 = 0,02  Pr(AB) thus: the events are dependent! • Pr(B|A) = Pr(AB)/Pr(A) = 0,05/0,1 = 0,5  Pr(B)

Generalized Multiplication law of probability(Eq 3.8) • From the definition of conditional probability, we have: Pr(AB) = Pr(A)*Pr(B|A) • In general: Pr(A1 A2 ... Ak) = Pr(A1)*Pr(A2|A1)*Pr(A3|A2A1)* …* Pr(Ak|Ak...A2A1)

Total-Probability Rule (Eq 3.7) A2 A1 B Ak

Prevalence • The prevalence of a disease equals the proportion of population that is diseased (def 3.17) • Expl. (Aalen, 1998): • By 31. December 1995, 21 482 Norwegian women suffered from breast cancer. • Total female population: 2 150 000 • Prevalence: 21 482 / 2 150 000 = 0,010 ( 1 %)

Incidence (or incidence rate) • Incidence is a measure of the number of new cases occurring during some time period (i.e. a rate) • Expl (Aalen, 1998): • During 1995, a total of 2 154 Norwegian women were diagnosed with breast cancer • Total female population: 2 150 000 • Incidence rate: 2 154 cases/ (2 150 000 persons * 1 year) = 0,0010 cases per person and year

Prevalence of cataract - expl 3.22 We wish to determine the total prevalence of cataract in the population ≥ 60 years during the next 5 years. Age specific prevalence is known. A = {60 - 64 yrs}, A = {65 - 69 yrs}, A = {70 - 74 yrs}, A = {75+ yrs}, 1 2 3 4 B = {catarakt within 5 år} Pr(A )=0,45, Pr(A )=0,28, Pr(A )=0,20, Pr(A )=0,07 1 2 3 4 Pr(B|A )=0,024, Pr(B|A )=0,046, Pr(B|A )=0,088, Pr(B|A )=0,153 1 2 3 4 k å Pr(B) = Pr (B|Ai)*Pr(Ai) i=1 0.024*0.450+ 046*0.280 +0.088* 0.20+ 0,153*0,070 = 0.052

Eks: Age adjusted incidenc of breast cancer, www.kreftregisteret.no

S B A Bayes’ rule, diagnosis and screening

A = {pos. mammogram} B = {breast cancer within 2 years} Diagnosis of breast cancer (expl 3.23)

S B A Bayes’ rule Definition (Rosner Eq. 3.9)Bayes’ rule/ theorem Combines the expressions of conditional and total probability: We have found one conditional probability by means of the “opposite” or “inverse” conditional probability!

Bayes’ rule Example (Rosner expl. 3.26, s. 61) Prevalence of hypertension = Pr (B) = 0,2. The auto-BP machine classifies 84 % of hypertensive patients and 23 % of normotensive patients as hypertensive. PPV? NPV?

Bayes’ rule. Low prevalence – a paradox?

Bayes’ rule, diagnosis and screening

Using a 2*2 table require us to “invent” patients on order to calculate PPV etc. …! With Bayes’ rule this information is utilised directly.

Diagnostics/ ROC Rosner tbl. 3.2 og 3.3, s. 63-64 Criterium “1+”: all rated 1 to 5 are diagnosed as abnormal.We find all the diseased, but identify none as healthy. Sensitivity = 1, spesificity = 0, ‘false positive rate’ = 1.

Diagnostics/ ROC Criterium “2+”: all rated 2 til 5 are diagnosed as abnormal.We find 48/51 diseased, and identify 33/58 as healthy. Sensitivity = 0,94 Specificity = 0,57 ‘False positive rate’ = 0,43

Diagnostics/ ROC Criterium “3+”: all rated 3 to 5 are diagnosed as abnormal. We find 46/51 diseased, and identify 39/58 as healthy.Sensitivity = 0,90 Spesificity = 0,67 ‘False positive rate’ = 0,33

Diagnostics/ ROC Criterium “4+”: all rated 4 and 5 are diagnosed as abnormal. We find 44/51 diseased, and identify 45/58 as healthy.Sensitivity = 0,86 Specificity = 0,78 ‘False positive rate’ = 0,22

Probability (Rosner, chapter 3) KLMED 8004, September 2010 Eirik Skogvoll, Consultant/ Professor

Probability (Rosner, chapter 3) KLMED 8004, September 2010 Eirik Skogvoll, Consultant/ Professor

Presentation Transcript

Introduction to probability

Rules of Probability

Penguin Group (USA)

Probability Assessment

CHAPTER 7, the logic of sampling

Joint Probability Distributions

Probability and Statistics with Reliability, Queuing and Computer Science Applications: Chapter 1 Introduction

Discrete Probability

TAG Meeting September 21, 2010

Special Education Leadership Conference 2010

Version 2.x Messaging Conformance

Chapter 5: Probability Distributions: Discrete Probability Distributions

Lecture Slides

滤泡淋巴瘤治疗策略更新

Introduction to Probability Theory in Machine Learning: A Bird View

Regional CWD budget 2010 = $0. No posters etc. sent to countries Conference calls to share

Probability

Probability Densities in Data Mining

Unit 7 - Probability

Probability and Discrete Random Variable

Continuous Random Variables and Probability Distributions